<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <!-- Source: dev.to/feed/ftisiot -->
  <channel>
    <title>DEV Community: Francesco Tisiot</title>
    <description>The latest articles on DEV Community by Francesco Tisiot (@ftisiot).</description>
    <link>https://siftrss.com/f/DkvP5gypPaB</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F571977%2F529fa0cd-c499-4dc7-b17f-461b46c68313.jpg</url>
      <title>DEV Community: Francesco Tisiot</title>
      <link>https://dev.to/ftisiot</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://siftrss.com/f/DkvP5gypPaB"/>
    <language>en</language>
    <item>
      <title>Introducing Developer Tier for Aiven for PostgreSQL® services</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Wed, 12 Nov 2025 14:10:00 +0000</pubDate>
      <link>https://dev.to/ftisiot/introducing-developer-tier-for-aiven-for-postgresqlr-services-1ik7</link>
      <guid>https://dev.to/ftisiot/introducing-developer-tier-for-aiven-for-postgresqlr-services-1ik7</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Starting at $8 USD, the new Developer tier includes everything from the Free tier, with extra disk space, preserved uptime for idle services, and Basic support to keep you building without interruption.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Aiven is introducing a new pricing plan for Aiven for PostgreSQL® services. Starting at $8 USD per month, the Developer tier offers more storage, so you can scale up your free PostgreSQL service in a cost-effective way. Unlike the Free tier, services on the Developer tier are not automatically powered off if inactive. The Developer tier also automatically includes Basic support services.&lt;/p&gt;

&lt;p&gt;More information on the Developer tier is available in the &lt;a href="https://aiven.io/docs/platform/concepts/service-pricing#developer-plan" rel="noopener noreferrer"&gt;Aiven docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjc1273iwjv68gwxmbq3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjc1273iwjv68gwxmbq3.png" alt="The Aiven Console showing Free, Developer and Professional Tier choices" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What do I get with the Developer tier?
&lt;/h2&gt;

&lt;p&gt;Compared to the Free tier, you still get a single node, with 1 CPU per virtual machine, monitoring and backups. But beyond that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increased storage from 1GB to a maximum of 8GB.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aiven.io/docs/platform/howto/support" rel="noopener noreferrer"&gt;Basic tier support&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Inactive services don’t automatically power off, meaning that your database will always be up and running, even if there’s no traffic to it.
See &lt;a href="https://aiven.io/docs/platform/concepts/service-pricing#developer-plan" rel="noopener noreferrer"&gt;the documentation&lt;/a&gt; for full technical details.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your database needs are more specific, you can easily switch and scale to the Professional plans, where you get additional benefits including the ability to choose a specific cloud provider and region for your service, the option for more nodes, even more disk space, longer backup, integrations with other services, forking and much more. Find out more in the &lt;a href="https://aiven.io/docs/products/postgresql" rel="noopener noreferrer"&gt;Aiven for PostgreSQL documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why three tiers?
&lt;/h2&gt;

&lt;p&gt;We originally started out with our selection of paid service plans, which you can now find neatly grouped under the Professional tier.&lt;/p&gt;

&lt;p&gt;We introduced the Free tier as a way of paying back to our community, providing a way to start and host your non critical PostgreSQL infrastructure on Aiven, and creating a great way to learn about using a database in the cloud.&lt;/p&gt;

&lt;p&gt;Now we’ve introduced the Developer tier as a cost-effective choice between those two. The Developer tier inherits some traits from the free tier and enhances it with some features you might find useful for your more advanced needs at a lower price point compared to our Professional plans.&lt;/p&gt;

&lt;p&gt;You can choose the balance that suits your project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free&lt;/strong&gt; to explore and learn the platform and PostgreSQL in the cloud at no cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer&lt;/strong&gt; as a cost-effective option for test and personal projects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Professional&lt;/strong&gt; for highly available business-critical workloads.
And as your project grows, there’s always an upgrade path available.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Questions you might ask
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Do I get to choose where the service runs?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As with the Free tier, you can choose the geographical area (Asia Pacific, Australia, Europe or North America), but not the cloud provider or region. If that’s a requirement, the selection is available in the Professional level plans.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Can I go (back) to the Free tier?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your service data will fit in the smaller instance size (1GB disk instead of 8GB), then it’s always possible to downgrade a service to the Free tier. See the documentation for how to &lt;a href="https://aiven.io/docs/platform/howto/scale-services" rel="noopener noreferrer"&gt;Change a service plan&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Can I upgrade to the Professional tier?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yes, it’s always possible to upgrade to a higher tier. See the documentation for how to &lt;a href="https://aiven.io/docs/platform/howto/scale-services" rel="noopener noreferrer"&gt;Change a service plan&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Should I use a Developer tier service in production?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As it says in the description in the Aiven Console, the Developer tier is intended to be “A cost-effective option for test and personal projects”.&lt;/p&gt;

&lt;p&gt;The Developer tier provides a working PostgreSQL environment with basic daily backup and Basic level support, but without advanced networking, point in time recovery, forking, integrations, or cloud provider and region selection. If any of the above features are interesting for you, then please look in the Professional plans.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why is this just for PostgreSQL?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We started with adding this new tier for Aiven for PostgreSQL because we’re confident that our users will find it useful. We do have ideas for other services - keep tuned to see what we announce in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it out!
&lt;/h2&gt;

&lt;p&gt;If you’re creating a new Aiven for PostgreSQL service and know that a Free service isn’t enough, but you don’t need all the resources of the Professional “Hobbyist” plan, then start a Developer service instead.&lt;/p&gt;

&lt;p&gt;If you’ve got an existing Free tier Aiven for PostgreSQL service that could use just that bit more disk space, or that you only use intermittently, or if you want some support, then consider upgrading to the Developer tier.&lt;/p&gt;

&lt;p&gt;And of course, if you’ve got an existing Hobbyist service that feels over-provisioned, you’ve now got the option of downgrading to Developer (although please do read the documentation to check that everything you need is still provided).&lt;/p&gt;

&lt;p&gt;Find out more in the &lt;a href="https://console.aiven.io/" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt;, or check out &lt;a href="https://aiven.io/docs/platform/concepts/service-pricing#developer-plan" rel="noopener noreferrer"&gt;the documentation&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>announcements</category>
      <category>productupdates</category>
      <category>platform</category>
    </item>
    <item>
      <title>Free PostgreSQL® in the Cloud - Aug 25</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Thu, 14 Aug 2025 08:20:11 +0000</pubDate>
      <link>https://dev.to/ftisiot/free-postgresqlr-in-the-cloud-58bm</link>
      <guid>https://dev.to/ftisiot/free-postgresqlr-in-the-cloud-58bm</guid>
      <description>&lt;p&gt;A short summary of Free PostgreSQL® services in the cloud as of August 14th 2025&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Free Tier Highlights&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://aiven.io/free-postgresql-database" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1 CPU, 1 GB RAM, 1GB Storage&lt;/td&gt;
&lt;td&gt;Production-like prototyping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://neon.com/" rel="noopener noreferrer"&gt;Neon&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;10 projects, 0.5 GB storage, serverless, branching&lt;/td&gt;
&lt;td&gt;Modern development workflows, serverless apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="http://fly.io/" rel="noopener noreferrer"&gt;Fly.io&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;3 GB storage, 160 GB outbound&lt;/td&gt;
&lt;td&gt;Lightweight global apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://render.com/pricing#postgresql" rel="noopener noreferrer"&gt;Render.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;0.1 CPU, 256 RAM, 100 connections, daily backups, 30 day limit&lt;/td&gt;
&lt;td&gt;Small apps needing more capacity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://aws.amazon.com/free/database/" rel="noopener noreferrer"&gt;Amazon RDS&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;20 GB + backups, 750 hrs/month&lt;/td&gt;
&lt;td&gt;Temporary usage via AWS Free Tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://supabase.com/" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;500 MB storage&lt;/td&gt;
&lt;td&gt;Small prototypes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>postgres</category>
      <category>free</category>
    </item>
    <item>
      <title>Helping PostgreSQL® professionals with AI-assisted performance recommendations</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Tue, 28 May 2024 15:00:00 +0000</pubDate>
      <link>https://dev.to/ftisiot/helping-postgresqlr-professionals-with-ai-assisted-performance-recommendations-27k2</link>
      <guid>https://dev.to/ftisiot/helping-postgresqlr-professionals-with-ai-assisted-performance-recommendations-27k2</guid>
      <description>&lt;p&gt;Since the beginning of my journey into the data world I've been keen on making professionals better at their data job. In the previous years that took the shape of creating materials in several different forms that could help people understand, use, and avoid mistakes on their data tool of choice. But now there's much more into it: a trusted AI solution to help data professional in their day to day optimization job.&lt;/p&gt;

&lt;h2&gt;
  
  
  From content to tooling
&lt;/h2&gt;

&lt;p&gt;The big advantage of the content creation approach is the 1-N effect: the material, once created and updated, can serve a multitude of people interested in the same technology or facing the same problem. You write an article once, and it gets found and adopted by a vast amount of professionals.&lt;/p&gt;

&lt;p&gt;The limit of content tho, it's that it is an extra resource, that people need to find and read elsewhere. While this is useful, it forces a context switch, moving people away from the problem they are facing. Here is where tooling helps, providing assistance in the same IDE that professionals are using for their day to day work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tooling for database professionals
&lt;/h2&gt;

&lt;p&gt;I have the luxury of working for &lt;a href="https://aiven.io/" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt; which provides professionals an integrated platform for all their data needs. In the last three years I witnessed the growth of the platform and its evolution with the clear objective to make it better usable at scale. Tooling like &lt;a href="https://aiven.io/integrations-and-connectors" rel="noopener noreferrer"&gt;integrations&lt;/a&gt;, &lt;a href="https://aiven.io/docs/tools/terraform" rel="noopener noreferrer"&gt;Terraform providers&lt;/a&gt; and the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Console&lt;/a&gt; facilitate the work that platform administrators have to perform on daily basis.&lt;/p&gt;

&lt;p&gt;But what about the day to day work of developers and CloudOps teams? This was facilitated when dealing with administrative tasks like backups, creation of read only replicas or upgrades, but the day to day work of optimizing the workloads was still completely on their hands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using trusted AI to optimize database workloads
&lt;/h2&gt;

&lt;p&gt;This, however, is now changing. With the recent launch of &lt;a href="https://aiven.io/blog/aiven-ai-dboptimizer-launch" rel="noopener noreferrer"&gt;Aiven AI Database Optimizer&lt;/a&gt; we are able to help both developer and CloudOps in their day to day optimization work!&lt;/p&gt;

&lt;p&gt;Aiven AI Database Optimizer provides, directly in the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Console&lt;/a&gt;, insights on database workloads alongside a one-click Optimize button that suggests index and SQL rewrites. It's a non intrusive solution who gathers informations from the slow query log and database metadata, and leverages an in-built AI engine, to provide accurate suggestions to improve performance.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/fOavII9QAmg"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The solution, based on the &lt;a href="https://www.eversql.com/" rel="noopener noreferrer"&gt;EverSQL by Aiven&lt;/a&gt; technology has been already adopted by 120.000 professionals optimizing over 2 million queries. It's not a wrapper around a public Generative AI provider, is a dedicated solution that keeps data privacy and security as a priority.&lt;/p&gt;

&lt;p&gt;You can experience it for Free in the Early Availability phase, just navigate to the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt; and create an Aiven for PostgreSQL® service! Once you have some workload on the service, the &lt;strong&gt;AI Insights&lt;/strong&gt; page will show you the queries and provide index and SQL rewrite suggestion to take your database performance to the next level!&lt;/p&gt;

&lt;p&gt;More information on the dedicated &lt;a href="https://aiven.io/solutions/aiven-ai-database-optimizer" rel="noopener noreferrer"&gt;Aiven AI Database Optimizer page&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>sql</category>
      <category>performance</category>
      <category>ai</category>
    </item>
    <item>
      <title>Introducing Aiven's AI Database Optimizer: The First Built-In SQL Optimizer for Enhanced Performance</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Tue, 28 May 2024 12:00:00 +0000</pubDate>
      <link>https://dev.to/aiven_io/introducing-aivens-ai-database-optimizer-the-first-built-in-sql-optimizer-for-enhanced-performance-402b</link>
      <guid>https://dev.to/aiven_io/introducing-aivens-ai-database-optimizer-the-first-built-in-sql-optimizer-for-enhanced-performance-402b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgchfpvltsf4zlcic373r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgchfpvltsf4zlcic373r.png" alt="Hero Image" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An efficient data infrastructure is a vital component in building &amp;amp; operating scalable and performant applications that are widely adopted, satisfy customers, and ultimately, drive business growth. Unfortunately, the speed of new feature delivery coupled with a lack of database optimization knowledge is exposing organizations to high risk performance issues. The new &lt;a href="https://aiven.io/solutions/aiven-ai-database-optimizer?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Aiven AI Database Optimizer&lt;/a&gt; helps organizations address performance both in the development and production phase, making it simple to quickly deploy, fully optimized, scalable, and cost efficient applications.&lt;/p&gt;

&lt;p&gt;Fully integrated with &lt;a href="https://aiven.io/postgresql?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Aiven for PostgreSQL&lt;/a&gt;®, Aiven AI Database Optimizer offers AI-driven performance insights, index, and SQL rewrite suggestions to maximize database performance, minimize costs, and make the best out of your cloud investment. &lt;/p&gt;

&lt;h2&gt;
  
  
  How does AI Database Optimizer work?
&lt;/h2&gt;

&lt;p&gt;Aiven AI Database Optimizer is a non-intrusive solution powered by &lt;a href="https://www.eversql.com/?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;EverSQL by Aiven&lt;/a&gt; that gathers information about database workloads, metadata and supporting data structures, such as indexes. Information about the number of query executions and average query times are continually processed by a mix of heuristic and AI models to determine which SQL statements can be further optimized. AI Database Optimizer then delivers accurate, secure optimization suggestions that you can trust, and that can be adopted to speed up query performance.&lt;/p&gt;

&lt;p&gt;Recommendations from Aiven’s AI Database Optimizer are already trusted by over 120,000 users in organizations ranging from startups to the largest global enterprises, who have optimized more than 2 million queries to date.&lt;/p&gt;

&lt;h2&gt;
  
  
  How does AI Database Optimizer help organizations?
&lt;/h2&gt;

&lt;p&gt;During development, AI Database Optimizer enables early performance testing, allowing easier redesign or refactoring of queries before they impact production. This enables customers to foster a culture of considering performance from the get-go, ensuring it is a priority throughout development rather than an afterthought.&lt;/p&gt;

&lt;p&gt;AI Database Optimizer also helps businesses gain an optimal user experience: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With fast query response times that ensure a smooth and responsive user experience, especially in data-intensive applications.&lt;/li&gt;
&lt;li&gt;By identifying and fixing performance bottlenecks organizations can reduce costs, avoid outages and deliver continuous service availability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A fleet of Aiven for PostgreSQL® databases is powering &lt;a href="https://www.youtube.com/watch?v=mPWxizlA3so" rel="noopener noreferrer"&gt;La Redoute&lt;/a&gt;’s marketplace functionality, driving 30% of their business. &lt;br&gt;
Diogo Passadouro - OPS-DBA Team Lead stated &lt;em&gt;"Aiven AI Database Optimizer has revolutionized the way we analyze database performance, providing a simple, clear and highly effective approach and has proven instrumental in enhancing the performance of our databases."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwsxlkj5bc36zcc1blpy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwsxlkj5bc36zcc1blpy.png" alt="Quote from Diogo Passadouro" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aiven.io/case-studies/conrad-electronic-expands-e-commerce-platform-with-aiven" rel="noopener noreferrer"&gt;Conrad&lt;/a&gt; is an advanced B2B sourcing platform selling 9 million products from 6,000 brands, powered by Aiven. Janek Wonner - Head of SRE &amp;amp; Cloud Technology stated &lt;em&gt;"Aiven for PostgreSQL is underpinning our fundamental company functionalities, we are looking forward to adopt Aiven AI database optimizer to empower our developers to create scalable code and empower our development teams with better performance insights and improvement suggestions to reduce the time to fix performance issues"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkc8t0qbybx2firird70.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkc8t0qbybx2firird70.png" alt="Quote from Janek Wonner" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;More information is available in the &lt;a href="https://aiven.io/solutions/aiven-ai-database-optimizer?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Aiven AI Database Optimizer page&lt;/a&gt;. You can experience it for yourself in any &lt;a href="https://aiven.io/postgresql" rel="noopener noreferrer"&gt;Aiven for PostgreSQL&lt;/a&gt; service for free during the Early Availability phase. Simply navigate to the “AI Insights” tab. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://console.aiven.io/signup?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Try it now&lt;/a&gt; or &lt;a href="https://aiven.io/book-demo?utm_source=devto&amp;amp;utm_medium=organic&amp;amp;utm_campaign=DB-Optimizer" rel="noopener noreferrer"&gt;Contact us&lt;/a&gt; today to check it out!&lt;/p&gt;

</description>
      <category>announcements</category>
      <category>ai</category>
      <category>postgres</category>
    </item>
    <item>
      <title>How to use pgbench to test PostgreSQL® performance</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Tue, 12 Mar 2024 15:00:00 +0000</pubDate>
      <link>https://dev.to/ftisiot/how-to-use-pgbench-to-test-postgresqlr-performance-4m6h</link>
      <guid>https://dev.to/ftisiot/how-to-use-pgbench-to-test-postgresqlr-performance-4m6h</guid>
      <description>&lt;p&gt;Testing a database performance is a must in every company. Despite everyone's needs beings slightly different, a good starting point for PostgreSQL® database is using &lt;a href="https://www.postgresql.org/docs/current/pgbench.html" rel="noopener noreferrer"&gt;pgbench&lt;/a&gt;: a tool shipped with the PostgreSQL installation that allows you to stress test a local or remote database. &lt;br&gt;
This blog post showcases how to install (on a Mac) and use pgbench to create load on a remote PostgreSQL database on &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you need a FREE PostgreSQL database? &lt;br&gt;
🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
⚡️ Need to optimize your SQL query with AI? ⚡️&lt;br&gt;
🐧 Check  &lt;a href="https://go.aiven.io/ft-ai-db-optimizer" rel="noopener noreferrer"&gt;Aiven AI database optimizer&lt;/a&gt;! Powered by EverSQL 🐧&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Install pgbench locally
&lt;/h2&gt;

&lt;p&gt;In a Mac, &lt;a href="https://www.postgresql.org/docs/current/pgbench.html" rel="noopener noreferrer"&gt;pgbench&lt;/a&gt; comes with the default PostgreSQL installation via brew. Therefore to have pgbench all you need is to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create a test PostgreSQL environment
&lt;/h2&gt;

&lt;p&gt;While you could create a test PostgreSQL environment locally (the &lt;a href="https://ftisiot.net/posts/1brows" rel="noopener noreferrer"&gt;1brc challenge blog post&lt;/a&gt; contains all the details), this time we'll create an Aiven for PostgreSQL service in minutes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create an account&lt;/li&gt;
&lt;li&gt;Create an Aiven for PostgreSQL service on your favorite cloud provider and region &lt;/li&gt;
&lt;li&gt;Select the &lt;code&gt;startup-4&lt;/code&gt; plan, it will be sufficient for the test.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use pgbench to load test the a PostgreSQL database
&lt;/h2&gt;

&lt;p&gt;The Aiven for PostgreSQL database comes with a &lt;code&gt;defaultdb&lt;/code&gt; database we can use for our testing. &lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Initialize the database
&lt;/h3&gt;

&lt;p&gt;All we need is to grab the connection details from the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt; and we are ready to &lt;strong&gt;initialize&lt;/strong&gt; the database with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;       &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-i&lt;/span&gt;                  &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-s&lt;/span&gt; &amp;lt;SCALEFACTOR&amp;gt;    &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;HOST&amp;gt;&lt;/code&gt; is the database hostname&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;PORT&amp;gt;&lt;/code&gt; is the database port&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;USER&amp;gt;&lt;/code&gt; is the database user&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;DBNAME&amp;gt;&lt;/code&gt; is the database name&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SCALEFACTOR&lt;/code&gt; is the test scale factor, &lt;code&gt;100&lt;/code&gt; could be a good place to start. The default is &lt;code&gt;1&lt;/code&gt; which will create a &lt;code&gt;16MB&lt;/code&gt; database, the &lt;code&gt;100&lt;/code&gt; scale will create a &lt;code&gt;1.6GB&lt;/code&gt; database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll be prompted to write the password, that we can find in the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: You can automate the flow without the need to include the password by storing it in the &lt;code&gt;PGPASSWORD&lt;/code&gt; environment variable with&lt;br&gt;
  &lt;code&gt;export PGPASSWORD=&amp;lt;PASSWORD&amp;gt;&lt;/code&gt; and then rerun the &lt;code&gt;pgbench&lt;/code&gt; command above.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After this pgbench will execute some work including the following tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;       Name       | nr_rows
------------------+---------
 pgbench_accounts | 10000000
 pgbench_branches |      100
 pgbench_history  |        0 
 pgbench_tellers  |     1000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: execute a first test to set the baseline
&lt;/h3&gt;

&lt;p&gt;After creating the dataset, it's time to define the performance baseline (a.k.a how much does my database perform under current conditions). Let's create one with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench  &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;                  &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;                       &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;                       &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; &amp;lt;NUMBER_OF_CLIENTS&amp;gt;          &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-j&lt;/span&gt; &amp;lt;NUMBER_OF_PGBENCH_THREADS&amp;gt;  &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-t&lt;/span&gt; &amp;lt;NUMBER_OF_TRANSACTIONS&amp;gt;     &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;NUMBER_OF_CLIENTS&amp;gt;&lt;/code&gt; is the number of clients to connect to PostgreSQL with, a good starting point is &lt;code&gt;20&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;NUMBER_OF_PGBENCH_THREADS&amp;gt;&lt;/code&gt; is the number pf pgbench threads to run concurrently&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;NUMBER_OF_TRANSACTIONS&amp;gt;&lt;/code&gt; is the overall number of transactions to execute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following code will execute 1000 transactions using 5 parallel threads and 5 clients&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench  &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;      &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-j&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-t&lt;/span&gt; 1000             &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pgbench (16.1, server 16.2)
starting vacuum...end.
transaction type: &amp;lt;builtin: TPC-B (sort of)&amp;gt;
scaling factor: 100
query mode: simple
number of clients: 5
number of threads: 5
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 5000/5000
number of failed transactions: 0 (0.000%)
latency average = 180.179 ms
initial connection time = 164.343 ms
tps = 27.750256 (without initial connection time)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Showcasing that we are generating almost 28 transactions per second (&lt;code&gt;tps&lt;/code&gt; in the above one) from my laptop sitting in Verona to a database sitting in Turin!&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: change parameters and validate the new performance
&lt;/h3&gt;

&lt;p&gt;What could we change? We could increase the percentage of total RAM dedicated to shared buffers to 60% by going in the &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt; selecting the PostgreSQL service, selecting the &lt;strong&gt;Service settings&lt;/strong&gt; menu and clicking on the &lt;strong&gt;Configure&lt;/strong&gt; button next to the Advanced configuration section.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcdokhkyx711ebqs5qcn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcdokhkyx711ebqs5qcn.png" alt="Shared buffers to 60%" width="800" height="628"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, if we try with the same pgbench command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench  &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;      &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-j&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-t&lt;/span&gt; 1000             &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get updated results with &lt;code&gt;36&lt;/code&gt; tps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: define you custom workload in a SQL file
&lt;/h3&gt;

&lt;p&gt;You can customize the type of queries pgbench will execute based on your workloads. All you need to do is to define a &lt;code&gt;SQL&lt;/code&gt; file and call it with pgbench &lt;code&gt;-f&lt;/code&gt; flag. Let's create a file named &lt;code&gt;test.sql&lt;/code&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;r1&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pgbench_accounts&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;bid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above we are setting a &lt;code&gt;r1&lt;/code&gt; variable with a random number between &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;2000&lt;/code&gt; and then use it to query the &lt;code&gt;pgbench_accounts&lt;/code&gt; table for rows with &lt;code&gt;bid&lt;/code&gt; equal to &lt;code&gt;r1&lt;/code&gt;.&lt;br&gt;
We can now execute the custom sql file with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pgbench  &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;HOST&amp;gt;      &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; &amp;lt;PORT&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;USER&amp;gt;           &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-c&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-j&lt;/span&gt; 5                &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-t&lt;/span&gt; 1000             &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-f&lt;/span&gt; test.sql         &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;DBNAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our example the above produces the following output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scaling factor: 1
query mode: simple
number of clients: 5
number of threads: 5
maximum number of tries: 1
number of transactions per client: 100
number of transactions actually processed: 500/500
number of failed transactions: 0 (0.000%)
latency average = 10514.270 ms
initial connection time = 245.617 ms
tps = 0.475544 (without initial connection time)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A miserable &lt;code&gt;0.47&lt;/code&gt; transactions per second. Can we do better?&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Optimize your database
&lt;/h3&gt;

&lt;p&gt;Once you took time to write a custom workload to test with pgbench and got a baseline, it's time to optimize your database. You could do it based on your knowledge and available data but, if you are using Aiven, you can receive automated insights about your PostgreSQL database workload and index and (in the future) SQL optimization suggestions. Check out the video below!&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/_hXGKhbgnos"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For more information, check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt; and &lt;a href="https://www.eversql.com/?utm_medium=organic&amp;amp;utm_source=ext_blog&amp;amp;utm_content=ftisiotpgbench" rel="noopener noreferrer"&gt;EverSQL&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In our case, Aiven will suggest to add an index on the &lt;code&gt;bid&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvyomurrvs61z95y0nnh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvyomurrvs61z95y0nnh.png" alt="Suggestion to add an Index" width="800" height="497"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;pgbench_accounts_idx_bid&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"pgbench_accounts"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"bid"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After creating it, and retrying the pgbench command above, the custom workload speed raised from &lt;code&gt;0.45&lt;/code&gt; to &lt;code&gt;35&lt;/code&gt; transaction per second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;pgbench is a great tool to perform performance checks on your PostgreSQL database and the custom script option provides a way to customize it to match your workloads. Used with Aiven and its AI insights is a great way to test and optimize your database!&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>pgbench</category>
      <category>sql</category>
      <category>performance</category>
    </item>
    <item>
      <title>From dbf to PostgreSQL with Python</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Mon, 04 Mar 2024 15:00:00 +0000</pubDate>
      <link>https://dev.to/ftisiot/from-dbf-to-postgresql-with-python-4kfa</link>
      <guid>https://dev.to/ftisiot/from-dbf-to-postgresql-with-python-4kfa</guid>
      <description>&lt;p&gt;Some time ago I found an interesting database file suffix I never faced before: the &lt;code&gt;.dbf&lt;/code&gt; and saw aroung that it was first introduced in 1983 with dBASE II. This article showcases how we can automatically generate the PostgreSQL table and fill it with data using Python and &lt;a href="https://dbfread.readthedocs.io/en/latest/exporting_data.html#dataset-sql" rel="noopener noreferrer"&gt;dbfread&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you need a FREE PostgreSQL database? &lt;br&gt;
🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
⚡️ Need to optimize your SQL query with AI? ⚡️&lt;br&gt;
🐧 Check  &lt;a href="https://go.aiven.io/ft-ai-db-optimizer" rel="noopener noreferrer"&gt;Aiven AI database optimizer&lt;/a&gt;! Powered by EverSQL 🐧&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Download a sample dbf file
&lt;/h2&gt;

&lt;p&gt;We can get a sample &lt;code&gt;.dbf&lt;/code&gt; file from &lt;a href="https://github.com/infused/dbf/blob/master/spec/fixtures/cp1251.dbf" rel="noopener noreferrer"&gt;Infused&lt;/a&gt; with the following in our terminal&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://github.com/infused/dbf/raw/master/spec/fixtures/cp1251.dbf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will store a file named &lt;code&gt;cp1251.dbf&lt;/code&gt; in the current folder.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use dbfread to move the data into PostgreSQL
&lt;/h2&gt;

&lt;p&gt;We need to install simpledf with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dbfread
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we can write a Python script (named &lt;code&gt;convert_bdf_to_sql.py&lt;/code&gt;) that opens the &lt;code&gt;sample.dbf&lt;/code&gt; file and creates the PostgreSQL DDL and loads the data into a &lt;code&gt;CSV&lt;/code&gt; file we can use to populate the database&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dbfread&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DBF&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;

&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;postgresql://[USER]:[PWD]@[HOST]:[PORT]/[DBNAME]?sslmode=require&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;people&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nc"&gt;DBF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cp1251.dbf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lowernames&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above script we:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;connect to a PostgreSQL instance using 

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[USER]&lt;/code&gt;: the username&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[PWD]&lt;/code&gt;: the password&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[HOST]&lt;/code&gt;: the hostname&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[PORT]&lt;/code&gt;: the port&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[DBNAME]&lt;/code&gt;: the database name&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;define a table named &lt;code&gt;people&lt;/code&gt; that will be created and populated&lt;/li&gt;

&lt;li&gt;insert into the &lt;code&gt;people&lt;/code&gt; table all the records in &lt;code&gt;cp1251.dbf&lt;/code&gt;
&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;We can then execute it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python convert_bdf_to_sql.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we now connect to our PostgreSQL database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;psql postgres://[USER]:[PWD]@[HOST]:[PORT]/[DBNAME]?sslmode&lt;span class="o"&gt;=&lt;/span&gt;require
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can check the &lt;code&gt;people&lt;/code&gt; table being populated with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;people&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see the data in the table!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | rn |                  name
----+----+----------------------------------------
  1 |  1 | амбулаторно-поликлиническое
  2 |  2 | больничное
  3 |  3 | НИИ
  4 |  4 | образовательное медицинское учреждение
(4 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>postgres</category>
      <category>dbf</category>
      <category>sql</category>
      <category>python</category>
    </item>
    <item>
      <title>List of PostgreSQL® AI Projects and Resources</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Thu, 29 Feb 2024 10:00:00 +0000</pubDate>
      <link>https://dev.to/ftisiot/list-of-postgresqlr-ai-projects-and-resources-4142</link>
      <guid>https://dev.to/ftisiot/list-of-postgresqlr-ai-projects-and-resources-4142</guid>
      <description>&lt;p&gt;Everyone is now talking about AI, and modern databases like PostgreSQL® are increasingly being adopted in companies' AI journey as sources of data or key pieces of the AI infrastructure. Moreover there's a new set projects that are solving PostgreSQL problems with AI.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/ftisiot/postgresql-ai-projects" rel="noopener noreferrer"&gt;List of PostgreSQL® AI projects and resources&lt;/a&gt; is open-source project to collect PostgreSQL® extensions, applications and resources (video or blogs) talking about how our loved database can fit in the AI journey.&lt;/p&gt;

&lt;p&gt;Did you write an article, gave a talk, are you working on an extension regarding PostgreSQL and AI? &lt;strong&gt;PRs are welcome&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjp4ic9m4h861impwu7o9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjp4ic9m4h861impwu7o9.png" alt="Preview of List of PostgreSQL® AI Projects and Resources" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>ai</category>
      <category>projects</category>
      <category>resources</category>
    </item>
    <item>
      <title>11 Lessons to learn when using NULLs in PostgreSQL®</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Wed, 28 Feb 2024 10:00:00 +0000</pubDate>
      <link>https://dev.to/ftisiot/11-lessons-to-learn-when-using-nulls-in-postgresqlr-2b7g</link>
      <guid>https://dev.to/ftisiot/11-lessons-to-learn-when-using-nulls-in-postgresqlr-2b7g</guid>
      <description>&lt;p&gt;A boolean value should only contain two values, &lt;code&gt;True&lt;/code&gt; or &lt;code&gt;False&lt;/code&gt;, but is it correct? Usually people assume so, but sometimes miss the fact that there could be the &lt;strong&gt;absence&lt;/strong&gt; of the value all-together. In databases this is absence is usually stored as &lt;code&gt;NULL&lt;/code&gt; and this blog showcases how to find them, use them properly and 11 lessons to learn to be a &lt;code&gt;NULL&lt;/code&gt; Pro! &lt;/p&gt;

&lt;p&gt;Keep in mind, it's not only booleans that can contain &lt;code&gt;NULL&lt;/code&gt; values, it's all the columns where you don't define a &lt;code&gt;NOT NULL&lt;/code&gt; constraint!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you need a FREE PostgreSQL database? &lt;br&gt;
🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
⚡️ Need to optimize your SQL query with AI? ⚡️&lt;br&gt;
🐧 Check  &lt;a href="https://go.aiven.io/ft-ai-db-optimizer" rel="noopener noreferrer"&gt;Aiven AI database optimizer&lt;/a&gt;! Powered by EverSQL 🐧&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  It all starts with some columns and rows
&lt;/h2&gt;

&lt;p&gt;Let's start from the basics: you have a PostgreSQL® database and a table, called &lt;code&gt;users&lt;/code&gt; like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Hey, do you want to test the above code but not having a PostgreSQL database handy? Past your code in &lt;a href="https://pgplayground.com/?utm_medium=organic&amp;amp;utm_source=ext_blog&amp;amp;utm_content=ftisiotNULL" rel="noopener noreferrer"&gt;PostgreSQL Playground&lt;/a&gt; and quickly check the results!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's insert some data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'jdoe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Jon'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'lspencer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Liz'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Spencer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'hlondon'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Hanna'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'London'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Querying the data showcases the table with all the columns filled.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age 
----+----------+-------+---------+-----
  1 | jdoe     | Jon   | Doe     |  25
  2 | lspencer | Liz   | Spencer |  35
  3 | hlondon  | Hanna | London  |  45
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Insert NULLs
&lt;/h2&gt;

&lt;p&gt;Now, let's check if we can insert some &lt;code&gt;NULL&lt;/code&gt;s, let's try by inserting them in the &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;surname&lt;/code&gt; and &lt;code&gt;age&lt;/code&gt; columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'test'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works since we don't have any constraint&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age 
----+----------+-------+---------+-----
  1 | jdoe     | Jon   | Doe     |  25
  2 | lspencer | Liz   | Spencer |  35
  3 | hlondon  | Hanna | London  |  45
  4 | test     |       |         |    
(4 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can even simplify the insert with the same effect&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'test1'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age 
----+----------+-------+---------+-----
  1 | jdoe     | Jon   | Doe     |  25
  2 | lspencer | Liz   | Spencer |  35
  3 | hlondon  | Hanna | London  |  45
  4 | test     |       |         |    
  5 | test1    |       |         |    
(5 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Inserting NULLs
&lt;/h2&gt;

&lt;p&gt;The first step to avoid &lt;code&gt;NULL&lt;/code&gt;s from appearing in a table is to forbid inserting them. The section below showcases a few situations on how to avoid inserting &lt;code&gt;NULL&lt;/code&gt;s in new and existing columns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inserting NULLs in the primary key
&lt;/h3&gt;

&lt;p&gt;Can I insert a &lt;code&gt;NULL&lt;/code&gt; in the table primary key? In theory this could be allowed since there's no explicit &lt;code&gt;NOT NULL&lt;/code&gt; constraint on the column. Let's try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However this fails with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR:  null value in column "username" of relation "users" violates not-null constraint
DETAIL:  Failing row contains (6, null, null, null, null).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;💡 Lesson 1 💡&lt;/strong&gt;: Primary Keys are by default &lt;code&gt;NOT NULL&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a new column with NOT NULL constraint
&lt;/h3&gt;

&lt;p&gt;Let's add a new column to our table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will fail, since the previously inserted data don't have any associated value for &lt;code&gt;points&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR:  column "points" of relation "users" contains null values
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's add a default value with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The table was altered, and now contains &lt;code&gt;0 points&lt;/code&gt; for all the existing rows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age | points 
----+----------+-------+---------+-----+--------
  1 | jdoe     | Jon   | Doe     |  25 |      0
  2 | lspencer | Liz   | Spencer |  35 |      0
  3 | hlondon  | Hanna | London  |  45 |      0
  4 | test     |       |         |     |      0
  5 | test1    |       |         |     |      0
(5 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we try to insert a new row without specifying the &lt;code&gt;points&lt;/code&gt; column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'usrwithnullpoints'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The row is inserted with the default &lt;code&gt;0 points&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |     username      | name  | surname | age | points 
----+-------------------+-------+---------+-----+--------
  1 | jdoe              | Jon   | Doe     |  25 |      0
  2 | lspencer          | Liz   | Spencer |  35 |      0
  3 | hlondon           | Hanna | London  |  45 |      0
  4 | test              |       |         |     |      0
  5 | test1             |       |         |     |      0
  7 | usrwithnullpoints |       |         |     |      0
(6 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What if we try to push a &lt;code&gt;NULL&lt;/code&gt; into &lt;code&gt;points&lt;/code&gt;?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'forcenullpoints'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get the &lt;code&gt;ERROR:  null value in column "points" of relation "users" violates not-null constraint&lt;/code&gt; stating that our insert is violating the &lt;code&gt;NOT NULL&lt;/code&gt; constraint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 2 💡&lt;/strong&gt;: Setting a &lt;code&gt;NOT NULL&lt;/code&gt; constraint doesn't allow inserting &lt;code&gt;NULL&lt;/code&gt;s&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 3 💡&lt;/strong&gt;: If we want to add a &lt;code&gt;NOT NULL&lt;/code&gt; column to an existing table with data, we need to specify the &lt;strong&gt;default&lt;/strong&gt; value for existing rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add a NOT NULL constraint to an existing column
&lt;/h3&gt;

&lt;p&gt;If we try to change an existing column (e.g. &lt;code&gt;name&lt;/code&gt;) to add the &lt;code&gt;NOT NULL&lt;/code&gt; constraint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get the similar error &lt;code&gt;ERROR:  column "name" of relation "users" contains null values&lt;/code&gt;, therefore we need also need to amend the current &lt;code&gt;NULL&lt;/code&gt; values (and possibly set a default)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Hugo'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'Hugo'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above we: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Altered the type, setting it to &lt;code&gt;TEXT&lt;/code&gt; again, to apply the function &lt;code&gt;COALESCE(name, 'Hugo')&lt;/code&gt; which is replacing &lt;code&gt;NULL&lt;/code&gt;s in the &lt;code&gt;name&lt;/code&gt; field with &lt;code&gt;Hugo&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set the default value for the &lt;code&gt;name&lt;/code&gt; column to &lt;code&gt;Hugo&lt;/code&gt; &amp;lt;- This is not strictly necessary&lt;/li&gt;
&lt;li&gt;Applied the &lt;code&gt;NOT NULL&lt;/code&gt; constraint since we don't have any &lt;code&gt;NULL&lt;/code&gt; values anymore&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: the above method locks the entire table until done. A faster approach could be to split the &lt;code&gt;COALESCE(name, 'Hugo')&lt;/code&gt; into a dedicated update and then only perform the &lt;code&gt;NOT NULL&lt;/code&gt; and &lt;code&gt;DEFAULT&lt;/code&gt; settings on the &lt;code&gt;ALTER TABLE&lt;/code&gt; statement.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The results are in line with the expected, no &lt;code&gt;NULL&lt;/code&gt;s in the &lt;code&gt;name&lt;/code&gt; column, being replaced by &lt;code&gt;Hugo&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |     username      | name  | surname | age | points 
----+-------------------+-------+---------+-----+--------
  1 | jdoe              | Jon   | Doe     |  25 |      0
  2 | lspencer          | Liz   | Spencer |  35 |      0
  3 | hlondon           | Hanna | London  |  45 |      0
  4 | test              | Hugo  |         |     |      0
  5 | test1             | Hugo  |         |     |      0
  7 | usrwithnullpoints | Hugo  |         |     |      0
(6 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;💡 Lesson 4 💡&lt;/strong&gt;: If we need to add a &lt;code&gt;NOT NULL&lt;/code&gt; column to an existing column in a table, we need to amend the existing &lt;code&gt;NULL&lt;/code&gt; values and optionally set the &lt;strong&gt;default&lt;/strong&gt; value for incoming rows.&lt;/p&gt;

&lt;p&gt;What if I want to &lt;strong&gt;remove&lt;/strong&gt; the &lt;code&gt;NOT NULL&lt;/code&gt; constraint? We can do it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Is DEFAULT enough?
&lt;/h3&gt;

&lt;p&gt;What if we leave the &lt;code&gt;NOT NULL&lt;/code&gt; constraint aside, and just set the &lt;code&gt;DEFAULT&lt;/code&gt;? Let's try with the &lt;code&gt;date_first_login&lt;/code&gt; new column&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;date_first_login&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Checking what's in the table, we see all the &lt;code&gt;date_first_login&lt;/code&gt; filled&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |     username      | name  | surname | age | points | date_first_login 
----+-------------------+-------+---------+-----+--------+------------------
  1 | jdoe              | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer          | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon           | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test              | Hugo  |         |     |      0 | 2024-02-27
  5 | test1             | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints | Hugo  |         |     |      0 | 2024-02-27
(6 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we try to insert a new row, without specifying the &lt;code&gt;date_first_login&lt;/code&gt;, it's correctly assigned to today&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user_with_no_date_first_login'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above is correctly inserted with a not null &lt;code&gt;date_first_login&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |           username            | name  | surname | age | points | date_first_login 
----+-------------------------------+-------+---------+-----+--------+------------------
  1 | jdoe                          | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer                      | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon                       | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test                          | Hugo  |         |     |      0 | 2024-02-27
  5 | test1                         | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints             | Hugo  |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login | Hugo  |         |     |      0 | 2024-02-27
(7 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So... is the &lt;code&gt;NOT NULL&lt;/code&gt; constraint really needed? The reply is &lt;strong&gt;YES&lt;/strong&gt; since I could insert &lt;code&gt;NULLs&lt;/code&gt; by specifying it in the &lt;code&gt;INSERT&lt;/code&gt; statement&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_first_login&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user_with_null_date_first_login'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which will be stored correctly with a &lt;code&gt;NULL&lt;/code&gt; value for &lt;code&gt;date_first_login&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |            username             | name  | surname | age | points | date_first_login 
----+---------------------------------+-------+---------+-----+--------+------------------
  1 | jdoe                            | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer                        | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon                         | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test                            | Hugo  |         |     |      0 | 2024-02-27
  5 | test1                           | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints               | Hugo  |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login   | Hugo  |         |     |      0 | 2024-02-27
 10 | user_with_null_date_first_login | Hugo  |         |     |      0 | 
(8 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;💡 Lesson 5 💡&lt;/strong&gt;: The &lt;code&gt;DEFAULT&lt;/code&gt; is going to prevent &lt;code&gt;NULL&lt;/code&gt;s from being inserted only if the &lt;code&gt;NULL&lt;/code&gt; is &lt;strong&gt;NOT&lt;/strong&gt; specified in the target column. To enforce the value we need an explicit &lt;code&gt;NOT NULL&lt;/code&gt; constraint&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying NULLs
&lt;/h2&gt;

&lt;p&gt;In the above section we tried to avoid storing &lt;code&gt;NULL&lt;/code&gt;s in our systems, what if we need to deal with existing &lt;code&gt;NULL&lt;/code&gt;s in our column? Here we can use a few SQL functions to detect them and change their value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detecting NULLs with the &lt;code&gt;IS NULL&lt;/code&gt; condition
&lt;/h3&gt;

&lt;p&gt;What if we want to detect the &lt;code&gt;NULL&lt;/code&gt; values? We can use the &lt;code&gt;IS NULL&lt;/code&gt; condition, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which will retrieve the list of &lt;code&gt;NULL&lt;/code&gt;s surnames&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |            username             | name | surname | age | points | date_first_login 
----+---------------------------------+------+---------+-----+--------+------------------
  4 | test                            | Hugo |         |     |      0 | 2024-02-27
  5 | test1                           | Hugo |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints               | Hugo |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login   | Hugo |         |     |      0 | 2024-02-27
 10 | user_with_null_date_first_login | Hugo |         |     |      0 | 
(5 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The opposite list could be retrieved by applying the &lt;code&gt;IS NOT NULL&lt;/code&gt; condition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Resulting in&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name  | surname | age | points | date_first_login 
----+----------+-------+---------+-----+--------+------------------
  1 | jdoe     | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon  | Hanna | London  |  45 |      0 | 2024-02-27
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What if I try to use the equal comparison (&lt;code&gt;=&lt;/code&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above retrieves &lt;strong&gt;NO rows&lt;/strong&gt; since the &lt;code&gt;NULL&lt;/code&gt; value means there is &lt;strong&gt;NO value&lt;/strong&gt; and is &lt;strong&gt;NOT&lt;/strong&gt; equal to any other value (&lt;code&gt;NULL&lt;/code&gt;=&lt;code&gt;NULL&lt;/code&gt; is &lt;strong&gt;False&lt;/strong&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | username | name | surname | age | points | date_first_login 
----+----------+------+---------+-----+--------+------------------
(0 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we apply the different operator (&lt;code&gt;&amp;lt;&amp;gt;&lt;/code&gt;) with &lt;code&gt;NULL&lt;/code&gt; we also get back &lt;strong&gt;NO rows&lt;/strong&gt;, since the &lt;code&gt;NULL&lt;/code&gt; value means there is &lt;strong&gt;NO value&lt;/strong&gt;, and therefore is not equal or different to other values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;💡 Lesson 6 💡&lt;/strong&gt;: To find &lt;code&gt;NULL&lt;/code&gt;s we should always use the &lt;code&gt;IS NULL&lt;/code&gt;; the &lt;code&gt;IS NOT NULL&lt;/code&gt; to fetch the not null rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Associating a default value to NULLs with COALESCE
&lt;/h3&gt;

&lt;p&gt;In the above we saw how to find &lt;code&gt;NULL&lt;/code&gt;s, what if we need to include them in a &lt;code&gt;SELECT&lt;/code&gt; statement with other values? We can either add an &lt;code&gt;OR&lt;/code&gt; condition like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'London'&lt;/span&gt; 
&lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or we can apply a transformation to change the &lt;code&gt;NULL&lt;/code&gt;s into a value we can compare&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Verona'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'London'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Verona'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both will have the same result&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |            username             | name  | surname | age | points | date_first_login 
----+---------------------------------+-------+---------+-----+--------+------------------
  3 | hlondon                         | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test                            | Hugo  |         |     |      0 | 2024-02-27
  5 | test1                           | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints               | Hugo  |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login   | Hugo  |         |     |      0 | 2024-02-27
 10 | user_with_null_date_first_login | Hugo  |         |     |      0 | 
(6 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, if we apply the second option, we must be sure that &lt;code&gt;Verona&lt;/code&gt; (the label we associate to &lt;code&gt;NULL&lt;/code&gt; values) is a value not present in the column or a value we want to include in our selection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 7 💡&lt;/strong&gt;: When using &lt;code&gt;COALESCE&lt;/code&gt; check out that the value you want to use as substitute of &lt;code&gt;NULL&lt;/code&gt; is not already present in the table and outside of your business logic otherwise you might include in the filter more values than expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performing Math on NULL columns
&lt;/h3&gt;

&lt;p&gt;A similar scenario is due when we need to perform a calculation over a &lt;code&gt;NULL&lt;/code&gt;able column. Let's take the example of the &lt;code&gt;age&lt;/code&gt; column: what if we want to calculate the average age?&lt;/p&gt;

&lt;p&gt;Let's see the data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id |            username             | name  | surname | age | points | date_first_login 
----+---------------------------------+-------+---------+-----+--------+------------------
  1 | jdoe                            | Jon   | Doe     |  25 |      0 | 2024-02-27
  2 | lspencer                        | Liz   | Spencer |  35 |      0 | 2024-02-27
  3 | hlondon                         | Hanna | London  |  45 |      0 | 2024-02-27
  4 | test                            | Hugo  |         |     |      0 | 2024-02-27
  5 | test1                           | Hugo  |         |     |      0 | 2024-02-27
  7 | usrwithnullpoints               | Hugo  |         |     |      0 | 2024-02-27
  9 | user_with_no_date_first_login   | Hugo  |         |     |      0 | 2024-02-27
 10 | user_with_null_date_first_login | Hugo  |         |     |      0 | 
(8 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can calculate the average age with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Resulting in &lt;code&gt;35&lt;/code&gt; which is the average between the 3 &lt;strong&gt;NOT NULL&lt;/strong&gt; ages! Similarly if we count the &lt;code&gt;date_first_login&lt;/code&gt;s with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_first_login&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is &lt;code&gt;7&lt;/code&gt;, one row for every user having a not null &lt;code&gt;date_first_login&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 8 💡&lt;/strong&gt;: When performing Mathematical operation with nullable columns, only &lt;strong&gt;NOT NULL&lt;/strong&gt; values will be elaborated!&lt;/p&gt;

&lt;p&gt;What if we try to calculate the average &lt;code&gt;points&lt;/code&gt;? First let's update the table to give &lt;code&gt;jdoe&lt;/code&gt; 25 points&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'jdoe'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's calcuate the average points&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is &lt;code&gt;3.125&lt;/code&gt; which is the mathematical average between the &lt;code&gt;25&lt;/code&gt; points of &lt;code&gt;jdoe&lt;/code&gt; and the &lt;code&gt;0&lt;/code&gt; of the rest due to us previously setting &lt;code&gt;0&lt;/code&gt; as default value. Is this correct? It depends! If we wanted to calculate the average number of points of users actually playing some games, this could be a misleading calculation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 9 💡&lt;/strong&gt;: When performing Mathematical operation with nullable columns, the default value of a column will impact the results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Joining on NULL columns
&lt;/h3&gt;

&lt;p&gt;Building on what we explored before about the &lt;code&gt;NULL&lt;/code&gt; not being equal or different to any other value, we need to pay closer attention when we deal with nullable columns in join condition. &lt;/p&gt;

&lt;p&gt;Let's create another table called &lt;code&gt;sessions&lt;/code&gt; and insert some data. For a weird reason, the design of this table will force us to join with &lt;code&gt;users&lt;/code&gt; on the &lt;code&gt;surname&lt;/code&gt; which is a NULLABLE field.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;surname&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Doe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Spencer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Spencer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, if we want to count users having a session on each day we could think of simply writing the above query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However the results shows only the 4 sessions below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; extract | count 
---------+-------
      25 |     2
      26 |     1
      27 |     1
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why? It's again because of &lt;code&gt;NULL&lt;/code&gt; values in the join condition. This problem of poor data quality on both ends of the join is stopping us from reporting any useful number. How could we fix it? Well, the fix should be done uphill by setting the &lt;code&gt;sessions&lt;/code&gt; table to use the &lt;code&gt;username&lt;/code&gt; instead of the &lt;code&gt;surname&lt;/code&gt; and defining a proper &lt;code&gt;FOREIGN KEY&lt;/code&gt; to the &lt;code&gt;users&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 10 💡&lt;/strong&gt;: Joining tables on nullable columns without caring about the NULL management will impact the results by reducing the number of rows included in the join.&lt;/p&gt;

&lt;p&gt;Still, if we want a simplistic solution, we could do the apply the &lt;code&gt;COALESCE&lt;/code&gt; to both sides of the join:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'NoSurname'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'NoSurname'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result shows an increased count of sessions&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; extract | count 
---------+-------
      25 |     7
      26 |     6
      27 |     6
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However this is still not the correct answer, since we are seeing &lt;code&gt;19&lt;/code&gt; session being aggregated while only &lt;code&gt;9&lt;/code&gt; sessions were inserted in the original &lt;code&gt;sessions&lt;/code&gt; table. Why is that? It's because we now casted all the &lt;code&gt;NULL&lt;/code&gt;s as &lt;code&gt;NoSurname&lt;/code&gt; on both sides of the join. Since we had &lt;code&gt;5&lt;/code&gt; sessions without &lt;code&gt;surname&lt;/code&gt; and &lt;code&gt;5&lt;/code&gt; users without &lt;code&gt;surname&lt;/code&gt; we are creating a cartesian join of all the empty surnames with the empty sessions.&lt;/p&gt;

&lt;p&gt;We had the following users without &lt;code&gt;surname&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test
test1
usrwithnullpoints
user_with_no_date_first_login
user_with_null_date_first_login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the following sessions without &lt;code&gt;surname&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  6 | 2024-02-27 14:10:08.419021
  7 | 2024-02-25 14:10:08.419021
  8 | 2024-02-25 14:10:08.419021
  9 | 2024-02-26 14:10:08.419021
 10 | 2024-02-27 14:10:08.419021
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each of the users was associated with each sessions, creating a overflow of sessions.&lt;/p&gt;

&lt;p&gt;A more correct solution, attribuiting all the &lt;code&gt;NULL&lt;/code&gt; session to a same phantom user &lt;code&gt;NoUser&lt;/code&gt; could be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'NoUser'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="k"&gt;left&lt;/span&gt; &lt;span class="k"&gt;outer&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; 
    &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surname&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;session_time&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However this is an effort to fix data quality issues at query time, which is never a good idea.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;💡 Lesson 11 💡&lt;/strong&gt;: Fixing data quality issues related to nullable columns at query time is a risky approach. Solving them with proper table design and constraints allows you to keep quality data in the tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Dealing with &lt;code&gt;NULL&lt;/code&gt;s is something to care upfront, during table design, in order to avoid data quality issues downstream! &lt;br&gt;
This is it for the 11 lessons learnt using &lt;code&gt;NULL&lt;/code&gt;s, do you have others to share? Feel free to reach me on X/Twitter at &lt;a class="mentioned-user" href="https://dev.to/ftisiot"&gt;@ftisiot&lt;/a&gt; or on &lt;a href="https://www.linkedin.com/in/francescotisiot/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>sql</category>
      <category>null</category>
      <category>isnull</category>
    </item>
    <item>
      <title>How to load JSON data in PostgreSQL with the COPY command</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Fri, 02 Feb 2024 13:00:00 +0000</pubDate>
      <link>https://dev.to/ftisiot/how-to-load-json-data-in-postgresql-with-the-the-copy-command-4gmh</link>
      <guid>https://dev.to/ftisiot/how-to-load-json-data-in-postgresql-with-the-the-copy-command-4gmh</guid>
      <description>&lt;p&gt;You have a JSON dataset that you want to upload to a PostgreSQL table containing properly formatted rows and columns... How do you do it? &lt;/p&gt;

&lt;p&gt;All the main sources like &lt;a href="https://ftisiot.net/postgresqljson/how-to-load-json-postgresql" rel="noopener noreferrer"&gt;my own blog&lt;/a&gt; and &lt;a href="https://konbert.com/blog/import-json-into-postgres-using-copy" rel="noopener noreferrer"&gt;others&lt;/a&gt; tell you to load the JSON in a dedicated temporary table containing a unique &lt;code&gt;JSON&lt;/code&gt; column, then parse it and load into the target table. However there could be another way, avoiding the temp table!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuazrsvjv9ckk3fgmlr14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuazrsvjv9ckk3fgmlr14.png" alt="Traditional and new JSON Load options" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this blog we'll see how to upload the JSON directly using PostgreSQL &lt;code&gt;COPY&lt;/code&gt; command and using an utility called &lt;a href="https://jqlang.github.io/jq/" rel="noopener noreferrer"&gt;jq&lt;/a&gt;!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you need a FREE PostgreSQL database? 🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
⚡️ Need to optimize your SQL query with AI? ⚡️&lt;br&gt;
🐧 Check  &lt;a href="https://go.aiven.io/ft-ai-db-optimizer" rel="noopener noreferrer"&gt;Aiven AI database optimizer&lt;/a&gt;! Powered by EverSQL 🐧&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  PostgreSQL COPY command
&lt;/h2&gt;

&lt;p&gt;First of all, let's check PostgreSQL &lt;a href="https://www.postgresql.org/docs/current/sql-copy.html" rel="noopener noreferrer"&gt;COPY command&lt;/a&gt;. It's a command that allows you to copy into a PostgreSQL table data from a file, there are two versions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;COPY&lt;/code&gt; if the file is sitting in the PG server already&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;\COPY&lt;/code&gt; if the file is sitting in the client machine connected to the server via &lt;code&gt;psql&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In both cases, the standard &lt;code&gt;COPY&lt;/code&gt; command has the following minimal set of parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;OPTIONAL&lt;/span&gt; &lt;span class="n"&gt;LIST&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;COLUMNS&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;SOURCE&lt;/span&gt; &lt;span class="n"&gt;FILE&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FORMAT&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;TARGET TABLE&amp;gt;&lt;/code&gt; is the name of the target table&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;(&amp;lt;OPTIONAL LIST OF COLUMNS&amp;gt;)&lt;/code&gt; defines the columns in the table to load&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;SOURCE FILE&amp;gt;&lt;/code&gt; is pointing to the source file to load&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;FORMAT&amp;gt;&lt;/code&gt; defines the format of the data&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The full list of parameters is available in the &lt;a href="https://www.postgresql.org/docs/current/sql-copy.html" rel="noopener noreferrer"&gt;PostgreSQL COPY documentation&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  PostgreSQL COPY command - the out of the box formats
&lt;/h3&gt;

&lt;p&gt;Let's focus on the formats available in the &lt;a href="https://www.postgresql.org/docs/current/sql-copy.html" rel="noopener noreferrer"&gt;PostgreSQL COPY documentation&lt;/a&gt; are listed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;TEXT&lt;/code&gt;: can be used to load a full text&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CSV&lt;/code&gt;: to load comma (or otherwise separated) values&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;BINARY&lt;/code&gt;: to load binary files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unfortunately there doesn't seem to be an out of the box way to load JSON files!&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;PROGRAM&lt;/code&gt; option to the rescue!
&lt;/h2&gt;

&lt;p&gt;A, maybe not known, option of the &lt;code&gt;COPY&lt;/code&gt; command is to point to a program to execute instead of the file. This option can be called with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;OPTIONAL&lt;/span&gt; &lt;span class="n"&gt;LIST&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;COLUMNS&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;PROGRAM&lt;/span&gt; &lt;span class="nv"&gt;"&amp;lt;SET OF INSTRUCTIONS&amp;gt;"&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compared to the previous &lt;code&gt;\copy&lt;/code&gt; call, this time we are adding the &lt;code&gt;PROGRAM&lt;/code&gt; with a set of instructions delimited by quotes or double quotes that will be executed on the client machine before loading the data.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you are using the &lt;code&gt;COPY&lt;/code&gt; command on the server, you'll probably need a superuser. This is the error message shown in &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt;: &lt;code&gt;ERROR:  must be superuser or have privileges of the pg_read_server_files role to COPY from a file&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, what we can do, is to reshape the data before loading it.&lt;/p&gt;

&lt;h3&gt;
  
  
  jq - the indispensable JSON parsing tool
&lt;/h3&gt;

&lt;p&gt;I've been using jq quite a while in a lot of blog posts, it's a very handy tool to parse, reshape, select JSON documents. For the purpose of this blog, we'll use to reshape the JSON input into a CSV format, digestible from the PostgreSQL &lt;code&gt;COPY&lt;/code&gt; command. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You need to have &lt;a href="https://jqlang.github.io/jq/" rel="noopener noreferrer"&gt;jq&lt;/a&gt; installed on the workstation from where the &lt;code&gt;COPY&lt;/code&gt; command is executed!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's create a basic JSON file with named &lt;code&gt;test.json&lt;/code&gt; with the following content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mystring"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"ciao"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mystring"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"sole"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mystring"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"mare"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With jq we can read and reshape the above JSON to a CSV format with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;more test.json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;". | [.id, .mystring] | @csv"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;more test.json&lt;/code&gt; reads the file&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;jq -r&lt;/code&gt; prints the raw output&lt;/li&gt;
&lt;li&gt;the first &lt;code&gt;.&lt;/code&gt; selects all the elements at the root level&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;| [.id, .mystring]&lt;/code&gt; retrieves the &lt;code&gt;id&lt;/code&gt; and &lt;code&gt;mystring&lt;/code&gt; keys from each element&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;|@csv&lt;/code&gt; sets the output format as CSV&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1,"ciao"
2,"sole"
3,"mare"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;To check the complete set of options available with jq please view the &lt;a href="https://jqlang.github.io/jq/manual/" rel="noopener noreferrer"&gt;manual&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Stitching all together
&lt;/h2&gt;

&lt;p&gt;So, how can we load a target table with just 1 &lt;code&gt;COPY&lt;/code&gt; command? Let's first create the target table with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;MYTARGETTABLE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myid&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mystring&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We now want to load the &lt;code&gt;myid&lt;/code&gt; and &lt;code&gt;mystring&lt;/code&gt; columns of the &lt;code&gt;MYTARGETTABLE&lt;/code&gt; table with the following &lt;code&gt;COPY&lt;/code&gt; command reading from the &lt;code&gt;test.json&lt;/code&gt; and applying the transformation with &lt;code&gt;jq&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;MYTARGETTABLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mystring&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;PROGRAM&lt;/span&gt; &lt;span class="s1"&gt;'more test.json | jq -r ". | [.id, .mystring] | @csv"'&lt;/span&gt;  
&lt;span class="n"&gt;CSV&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is the data properly loaded in the &lt;code&gt;MYTARGETTABLE&lt;/code&gt; table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; id | myid | mystring
----+------+----------
  1 |    1 | ciao
  2 |    2 | sole
  3 |    3 | mare
(3 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Solving problems at the source sometimes is useful to avoid extra hops! Stitching together &lt;code&gt;COPY&lt;/code&gt; (with &lt;code&gt;PROGRAM&lt;/code&gt;) and &lt;code&gt;jq&lt;/code&gt; provides us the flexibility to load JSON files without intermediary tables.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>json</category>
      <category>copy</category>
      <category>dataload</category>
    </item>
    <item>
      <title>Load StackOverflow's StackExchange data in PostgreSQL®</title>
      <dc:creator>Francesco Tisiot</dc:creator>
      <pubDate>Wed, 31 Jan 2024 10:00:00 +0000</pubDate>
      <link>https://dev.to/ftisiot/load-stackoverflows-stackexchange-data-in-postgresqlr-40ip</link>
      <guid>https://dev.to/ftisiot/load-stackoverflows-stackexchange-data-in-postgresqlr-40ip</guid>
      <description>&lt;p&gt;I recently found about the StackOverflow dataset in &lt;a href="https://www.kaggle.com/datasets/stackoverflow/stackoverflow" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt;, which points to the &lt;a href="https://archive.org/download/stackexchange" rel="noopener noreferrer"&gt;StackExchange&lt;/a&gt; link to download the entire data. In this blog I'll show a (maybe not 100% polished) way to upload the data to several tables in a PostgreSQL database!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmobjqsimv80kendb9fb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmobjqsimv80kendb9fb.png" alt="Architecture of the data loading phase" width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The data is provided by the &lt;a href="https://archive.org/details/stackexchange" rel="noopener noreferrer"&gt;Stack Exchange Network&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you need a FREE PostgreSQL database? 🦀 Check &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven's FREE plans&lt;/a&gt;! 🦀&lt;br&gt;
If you need to optimize your SQL query?  🐧 Check &lt;a href="https://www.eversql.com/?utm_medium=organic&amp;amp;utm_source=ext_blog&amp;amp;utm_content=ftisiotstackoverflow" rel="noopener noreferrer"&gt;EverSQL&lt;/a&gt;! 🐧&lt;/p&gt;
&lt;h2&gt;
  
  
  Create a PostgreSQL database
&lt;/h2&gt;

&lt;p&gt;You can create one with &lt;a href="https://go.aiven.io/francesco-signup" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt; or just have one local, the below code works in both examples.&lt;/p&gt;
&lt;h2&gt;
  
  
  Create database tables
&lt;/h2&gt;

&lt;p&gt;Every downloaded file, referring to a specific site, contains the following files: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Posts&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;&lt;code&gt;Users&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Votes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Comments&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PostHistory&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PostLinks&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The detailed schema information is available in this &lt;a href="https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede/2678#2678" rel="noopener noreferrer"&gt;meta post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We'll load the data in a two step approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First load the XML files in a temporary table&lt;/li&gt;
&lt;li&gt;Then load the data from the temporary table into the proper tables with optimised column types&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmobjqsimv80kendb9fb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmobjqsimv80kendb9fb.png" alt="Architecture of the data loading phase" width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To load the data we need the following data structures: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a table &lt;code&gt;data_load&lt;/code&gt; containing a unique &lt;code&gt;data&lt;/code&gt; &lt;code&gt;TEXT&lt;/code&gt; column we'll use to load the XML data row by row&lt;/li&gt;
&lt;li&gt;a set of tables matching the file structures in &lt;code&gt;Posts&lt;/code&gt;, &lt;code&gt;Users&lt;/code&gt;, &lt;code&gt;Votes&lt;/code&gt;, &lt;code&gt;Comments&lt;/code&gt;, &lt;code&gt;PostHistory&lt;/code&gt;, &lt;code&gt;PostLinks&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can create the &lt;code&gt;data_load&lt;/code&gt; table with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;data_load&lt;/code&gt; table will be used to load the XML files on the database and then parse them accordingly.&lt;/p&gt;

&lt;p&gt;The following tables are needed to properly store the data in the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;reputation&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CreationDate&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DisplayName&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LastAccessDate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;Location&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AboutMe&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;views&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;UpVotes&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DownVotes&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AccountId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PostTypeId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CreationDate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;viewcount&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;OwnerUserId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LastActivityDate&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Tags&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AnswerCount&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CommentCount&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ContentLicense&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;badges&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tagbased&lt;/span&gt; &lt;span class="nb"&gt;boolean&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;postId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;creationdate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;userid&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contentlicense&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;posthistory&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PostHistoryTypeId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;postId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;RevisionGUID&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CreationDate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;userID&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ContentLicense&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;postlinks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;creationdate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;postId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;relatedPostId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LinkTypeId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tagname&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ExcerptPostId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;WikiPostId&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IsRequired&lt;/span&gt; &lt;span class="nb"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IsModeratorOnly&lt;/span&gt; &lt;span class="nb"&gt;boolean&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;votes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;postid&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;votetypeid&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;creationdate&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Load the data
&lt;/h2&gt;

&lt;p&gt;Let's download the &lt;code&gt;ai.stackexchange.com&lt;/code&gt; the section from StackExchange dedicated to AI. Within the downloaded folder we can find 8 files (1-1 with our tables). Let's try to load the &lt;code&gt;Users.xml&lt;/code&gt; first.&lt;/p&gt;

&lt;p&gt;As mentioned before, we'll perform a two step loading approach: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load the XML into a table, with each XML row on a different database row&lt;/li&gt;
&lt;li&gt;Parse the XML to populate the proper tables and columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The two step approach is needed since the original dataset threats all columns, including ids, as strings. We could either define all the Ids as strings or, do a bit more work to load the data into the proper column definitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Load the User XML into the &lt;code&gt;data_load&lt;/code&gt; table
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l3d119rzv8wh2m0ahyb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l3d119rzv8wh2m0ahyb.png" alt="Architecture of the data loading phase - step 1" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To load the entire XML of the &lt;code&gt;Users.xml&lt;/code&gt; file into a temporary &lt;code&gt;data_load&lt;/code&gt; table we can connect to the database using &lt;code&gt;psql&lt;/code&gt; from the same folder where the &lt;code&gt;Users.xml&lt;/code&gt; is located and execute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Users.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;\copy&lt;/code&gt; command allows us to load the dataset into the &lt;code&gt;data_load&lt;/code&gt; table. However we need to perform a couple of tricks in order to load it properly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tr -d "\t" &amp;lt; Users.xml&lt;/code&gt; removes the tabs from the file.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sed -e ''s/\\/\\\\/g''&lt;/code&gt; properly escapes the &lt;code&gt;\&lt;/code&gt; in the strings.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;HEADER&lt;/code&gt; avoids to load the initial &lt;code&gt;&amp;lt;?xml version="1.0" encoding="utf-8"?&amp;gt;&lt;/code&gt; row, unnecessary for our parsing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 2: Load the data into the &lt;code&gt;Users&lt;/code&gt; table
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbin7car721jyutt78z4f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbin7car721jyutt78z4f.png" alt="Architecture of the data loading phase - step 2" width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above command loads the data into the &lt;code&gt;data_load&lt;/code&gt; table with one row in the table for each row in the original file. This is not optimal but avoids having to deal with very large files included in just a single blob.&lt;/p&gt;

&lt;p&gt;We can make use the fact that each user is contained in a &lt;code&gt;&amp;lt;raw&amp;gt;&lt;/code&gt; tag to leverage PostgreSQL ability to parse XML fields with the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;USERS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CreationDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DisplayName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LastAccessDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;AboutMe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Views&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UpVotes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DownVotes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;AccountId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Reputation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;CreationDate&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@DisplayName'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;DisplayName&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@LastAccessDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;LastAccessDate&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Location'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;Location&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@AboutMe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;AboutMe&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Views'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Views&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@UpVotes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;UpVotes&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@DownVotes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;DownVotes&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@AccountId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;AccountId&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Reputation'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;Reputation&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;code&gt;xpath&lt;/code&gt; to extract the relevant fields from each user&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;::text::int&lt;/code&gt; to cast the extracted field to the proper column type&lt;/li&gt;
&lt;li&gt;the filter &lt;code&gt;regexp_like(data,'[ ]+\&amp;lt;row')&lt;/code&gt; to remove the other unnecessary rows, including for example, &lt;code&gt;&amp;lt;users&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;/users&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Load the other tables
&lt;/h3&gt;

&lt;p&gt;Similar to the example with &lt;code&gt;Users.xml&lt;/code&gt; above, we can load the other tables with the following steps (please not we are truncating the &lt;code&gt;data_load&lt;/code&gt; table before loading the next file):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Loading posts&lt;/span&gt;
&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Posts.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;POSTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PostTypeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CreationDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;viewcount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OwnerUserId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LastActivityDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AnswerCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CommentCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ContentLicense&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostTypeId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Score'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ViewCount'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Body'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@OwnerUserId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@LastActivityDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Title'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Tags'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@AnswerCount'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CommentCount'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; 
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ContentLicense'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading badges&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Badges.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;BADGES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;tagbased&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@UserId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Name'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Date'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Class'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@TagBased'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;boolean&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading comments&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Comments.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;COMMENTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;creationdate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;userid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contentlicense&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Score'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@UserId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ContentLicense'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading posthistory&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; PostHistory.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;posthistory&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PostHistoryTypeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RevisionGUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CreationDate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;userID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ContentLicense&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostHistoryTypeId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@RevisionGUID'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@UserId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ContentLicense'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading postlinks&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; PostLinks.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;postlinks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;creationdate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relatedPostId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LinkTypeId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@RelatedPostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@LinkTypeId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;span class="c1"&gt;-- Loading tags&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Tags.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tagname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ExcerptPostId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WikiPostId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IsModeratorOnly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IsRequired&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@TagName'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Count'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@ExcerptPostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@WikiPostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@IsModeratorOnly'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;boolean&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@IsRequired'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;boolean&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Loading votes&lt;/span&gt;

&lt;span class="k"&gt;truncate&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="s1"&gt;'tr -d "&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s1"&gt;" &amp;lt; Votes.xml | sed -e &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s1"&gt;/g&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;votes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;votetypeid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;creationdate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@Id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@PostId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@VoteTypeId'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;,(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//@CreationDate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;data_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'//row'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;xml&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;regexp_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'[ ]+&lt;/span&gt;&lt;span class="se"&gt;\&amp;lt;&lt;/span&gt;&lt;span class="s1"&gt;row'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Query Time
&lt;/h2&gt;

&lt;p&gt;Once the data is loaded, we can start querying the dataset. For example, finding the top 2 post having a comment with highest score using the following SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;POSTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;POSTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;COMMENTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;COMMENTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SCORE&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
&lt;span class="n"&gt;POSTS&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;COMMENTS&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;POSTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;COMMENTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;POSTID&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;SCORE&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enjoy the dataset and find your own questions and answers!&lt;/p&gt;

&lt;p&gt;The following links provide the entire set of &lt;a href="https://ftisiot.net/images/files/2024/stackoverflow/table_create.sql" rel="noopener noreferrer"&gt;DDLs&lt;/a&gt; and &lt;a href="https://ftisiot.net/images/files/2024/stackoverflow/data_load.sql" rel="noopener noreferrer"&gt;Loading&lt;/a&gt; SQL!&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>stackoverflow</category>
      <category>stackexchange</category>
      <category>dataload</category>
    </item>
  </channel>
</rss>
