How To Analyze Managed PostgreSQL Database Statistics Using the Elastic Stack on Ubuntu 18.04

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

Database monitoring is the continuous process of systematically tracking various metrics that show how the database is performing. By observing the performance data, you can gain valuable insights and identify possible bottlenecks, as well as find additional ways of improving database performance. Such systems often implement alerting, which notifies administrators when things go wrong. Gathered statistics can be used to not only improve the configuration and workflow of the database, but also those of client applications.

The benefit of using the Elastic Stack (ELK stack) for monitoring your managed database is its excellent support for searching and the ability to ingest new data very quickly. It does not excel at updating the data, but this trade off is acceptable for monitoring and logging purposes, where past data is almost never changed. Elasticsearch offers powerful means of querying the data, which you can use through Kibana to get a better understanding of how the database fares through different time periods. This will allow you to correlate database load with real-life events to gain insight into how the database is being used.

In this tutorial, you’ll import database metrics, generated by the PostgreSQL statistics collector, into Elasticsearch via Logstash. This entails configuring Logstash to pull data from the database using the PostgreSQL JDBC connector to send it to Elasticsearch for indexing immediately afterward. The imported data can later be analyzed and visualized in Kibana. Then, if your database is brand new, you’ll use pgbench, a PostgreSQL benchmarking tool, to create more interesting visualizations. In the end, you’ll have an automated system pulling in PostgreSQL statistics for later analysis.

Prerequisites

Step 1 — Setting up Logstash and the PostgreSQL JDBC Driver

In this section, you will install Logstash and download the PostgreSQL JDBC driver so that Logstash will be able to connect to your managed database.

Start off by installing Logstash with the following command:

  • sudo apt install logstash -y

Once Logstash is installed, enable the service to automatically start on boot:

  • sudo systemctl enable logstash

Logstash is written in Java, so in order to connect to PostgreSQL it requires the PostgreSQL JDBC (Java Database Connectivity) library to be available on the system it is running on. Because of an internal limitation, Logstash will properly load the library only if it is found under the /usr/share/logstash/logstash-core/lib/jars directory, where it stores third-party libraries it uses.

Head over to the download page of the JDBC library and copy the link to latest version. Then, download it using curl by running the following command:

  • sudo curl https://jdbc.postgresql.org/download/postgresql-42.2.6.jar -o /usr/share/logstash/logstash-core/lib/jars/postgresql-jdbc.jar

At the time of writing, the latest version of the library was 42.2.6, with Java 8 as the supported runtime version. Ensure you download the latest version; pairing it with the correct Java version that both JDBC and Logstash support.

Logstash stores its configuration files under /etc/logstash/conf.d, and is itself stored under /usr/share/logstash/bin. Before you create a configuration that will pull statistics from your database, you’ll need to enable the JDBC plugin in Logstash by running the following command:

  • sudo /usr/share/logstash/bin/logstash-plugin install logstash-input-jdbc

You’ve installed Logstash using apt and downloaded the PostgreSQL JDBC library so that Logstash can use it to connect to your managed database. In the next step, you will configure Logstash to pull statistical data from it.

Step 2 — Configuring Logstash To Pull Statistics

In this section, you will configure Logstash to pull metrics from your managed PostgreSQL database.

You’ll configure Logstash to watch over three system databases in PostgreSQL, namely:

  • pg_stat_database: provides statistics about each database, including its name, number of connections, transactions, rollbacks, rows returned by querying the database, deadlocks, and so on. It has a stats_reset field, which specifies when the statistics were last reset.
  • pg_stat_user_tables: provides statistics about each table created by the user, such as the number of inserted, deleted, and updated rows.
  • pg_stat_user_indexes: collects data about all indexes in user-created tables, such as the number of times a particular index has been scanned.

You’ll store the configuration for indexing PostgreSQL statistics in Elasticsearch in a file named postgresql.conf under the /etc/logstash/conf.d directory, where Logstash stores configuration files. When started as a service, it will automatically run them in the background.

Create postgresql.conf using your favorite editor (for example, nano):

  • sudo nano /etc/logstash/conf.d/postgresql.conf

Add the following lines:

/etc/logstash/conf.d/postgresql.conf
input {
        # pg_stat_database
        jdbc {
                jdbc_driver_library => ""
                jdbc_driver_class => "org.postgresql.Driver"
                jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"
                jdbc_user => "username"
                jdbc_password => "password"
                statement => "SELECT * FROM pg_stat_database"
                schedule => "* * * * *"
                type => "pg_stat_database"
        }

        # pg_stat_user_tables
        jdbc {
                jdbc_driver_library => ""
                jdbc_driver_class => "org.postgresql.Driver"
                jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"
                jdbc_user => "username"
                jdbc_password => "password"
                statement => "SELECT * FROM pg_stat_user_tables"
                schedule => "* * * * *"
                type => "pg_stat_user_tables"
        }

        # pg_stat_user_indexes
        jdbc {
                jdbc_driver_library => ""
                jdbc_driver_class => "org.postgresql.Driver"
                jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"
                jdbc_user => "username"
                jdbc_password => "password"
                statement => "SELECT * FROM pg_stat_user_indexes"
                schedule => "* * * * *"
                type => "pg_stat_user_indexes"
        }
}

output {
        elasticsearch {
                hosts => "http://localhost:9200"
                index => "%{type}"
        }
}

Remember to replace host with your host address, port with the port to which you can connect to your database, username with the database user username, and password with its password. All these values can be found in the Control Panel of your managed database.

In this configuration, you define three JDBC inputs and one Elasticsearch output. The three inputs pull data from the pg_stat_database, pg_stat_user_tables, and pg_stat_user_indexes databases, respectively. They all set the jdbc_driver_library parameter to an empty string, because the PostgreSQL JDBC library is in a folder that Logstash automatically loads.

Then, they set the jdbc_driver_class, whose value is specific to the JDBC library, and provide a jdbc_connection_string, which details how to connect to the database. The jdbc: part signifies that it is a JDBC connection, while postgres:// indicates that the target database is PostgreSQL. Next come the host and port of the database, and after the forward slash you also specify a database to connect to; this is because PostgreSQL requires you to be connected to a database to be able to issue any queries. Here, it is set to the default database that always exists and can not be deleted, aptly named defaultdb.

Next, they set a username and password of the user through which the database will be accessed. The statement parameter contains a SQL query that should return the data you wish to process—in this configuration, it selects all rows from the appropriate database.

The schedule parameter accepts a string in cron syntax that defines when Logstash should run this input; omitting it completely will make Logstash run it only once. Specifying * * * * *, as you have done so here, will tell Logstash to run it every minute. You can specify your own cron string if you want to collect data at different intervals.

There is only one output, which accepts data from three inputs. They all send data to Elasticsearch, which is running locally and is reachable at http://localhost:9200. The index parameter defines to which Elasticsearch index it will send the data, and its value is passed in from the type field of the input.

When you are done with editing, save and close the file.

You’ve configured Logstash to gather data from various PostgreSQL statistical tables and send them to Elasticsearch for storage and indexing. Next, you’ll run Logstash to test the configuration.

Step 3 — Testing the Logstash Configuration

In this section, you will test the configuration by running Logstash to verify it will properly pull the data. Then, you will make this configuration run in the background by configuring it as a Logstash pipeline.

Logstash supports running a specific configuration by passing its file path to the -f parameter. Run the following command to test your new configuration from the last step:

  • sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/postgresql.conf

It may take some time before it shows any output, which will look similar to this:

Output
Thread.exclusive is deprecated, use Thread::Mutex WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console [WARN ] 2019-08-02 18:29:15.123 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified [INFO ] 2019-08-02 18:29:15.154 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"7.3.0"} [INFO ] 2019-08-02 18:29:18.209 [Converge PipelineAction::Create<main>] Reflections - Reflections took 77 ms to scan 1 urls, producing 19 keys and 39 values [INFO ] 2019-08-02 18:29:20.195 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:20.667 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.221 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.230 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.274 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.337 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:21.369 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.386 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.386 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.409 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.430 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:21.444 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.465 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.466 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.468 [Ruby-0-Thread-7: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.538 [Ruby-0-Thread-5: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.545 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.589 [Ruby-0-Thread-9: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.696 [Ruby-0-Thread-5: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [INFO ] 2019-08-02 18:29:21.769 [Ruby-0-Thread-7: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [INFO ] 2019-08-02 18:29:21.771 [Ruby-0-Thread-9: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [WARN ] 2019-08-02 18:29:21.871 [[main]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been create for key: cluster_uuids. This may result in invalid serialization. It is recommended to log an issue to the responsible developer/development team. [INFO ] 2019-08-02 18:29:21.878 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>125, :thread=>"#<Thread:0x470bf1ca run>"} [INFO ] 2019-08-02 18:29:22.351 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"} [INFO ] 2019-08-02 18:29:22.721 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]} [INFO ] 2019-08-02 18:29:23.798 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600} /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated [INFO ] 2019-08-02 18:30:02.333 [Ruby-0-Thread-22: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:284] jdbc - (0.042932s) SELECT * FROM pg_stat_user_indexes [INFO ] 2019-08-02 18:30:02.340 [Ruby-0-Thread-23: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:331] jdbc - (0.043178s) SELECT * FROM pg_stat_user_tables [INFO ] 2019-08-02 18:30:02.340 [Ruby-0-Thread-24: :1] jdbc - (0.036469s) SELECT * FROM pg_stat_database ...

If Logstash does not show any errors and logs that it has successfully SELECTed rows from the three databases, your database metrics will be shipped to Elasticsearch. If you get an error, double check all the values in the configuration file to ensure that the machine you’re running Logstash on can connect to the managed database.

Logstash will continue importing the data at specified times. You can safely stop it by pressing CTRL+C.

As previously mentioned, when started as a service, Logstash automatically runs all configuration files it finds under /etc/logstash/conf.d in the background. Run the following command to start it as a service:

  • sudo systemctl start logstash

In this step, you ran Logstash to check if it can connect to your database and gather data. Next, you’ll visualize and explore some of the statistical data in Kibana.

Step 4 — Exploring Imported Data in Kibana

In this section, you’ll see how you can explore the statistical data describing your database’s performance in Kibana.

In your browser, navigate to the Kibana installation you set up as a prerequisite. You’ll see the default welcome page.

Kibana - Default Welcome Page

To interact with Elasticsearch indexes in Kibana, you’ll need to create an index pattern. Index patterns specify on which indexes Kibana should operate. To create one, press on the last icon (wrench) from the left-hand vertical sidebar to open the Management page. Then, from the left menu, press on Index Patterns under Kibana. You’ll see a dialog box for creating an index pattern.

Kibana - Add Index Pattern

Listed are the three indexes where Logstash has been sending statistics. Type in pg_stat_database in the Index Pattern input box and then press Next step. You’ll be asked to select a field that stores time, so you’ll be able to later narrow your data by a time range. From the dropdown, select @timestamp.

Kibana - Index Pattern Timestamp Field

Press on Create index pattern to finish creating the index pattern. You’ll now be able to explore it using Kibana. To create a visualization, press on the second icon in the sidebar, and then on Create new visualization. Select the Line visualization when the form pops up, and then choose the index pattern you have just created (pg_stat_database). You’ll see an empty visualization.

Kibana - Empty Visualisation

On the central part of the screen is the resulting plot—the left-side panel governs its generation from which you can set the data for X and Y axis. In the upper right-hand side of the screen is the date range picker. Unless you specifically choose another range when configuring the data, that range will be shown on the plot.

You’ll now visualize the average number of data tuples INSERTed on minutes in the given interval. Press on Y-Axis under Metrics in the panel on the left to unfold it. Select Average as the Aggregation and select tup_inserted as the Field. This will populate the Y axis of the plot with the average values.

Next, press on X-Axis under Buckets. For the Aggregation, choose Date Histogram. @timestamp should be automatically selected as the Field. Then, press on the blue play button on the top of the panel to generate your graph. If your database is brand new and not used, you won’t see anything yet. In all cases, however, you will see an accurate portrayal of database usage.

Kibana supports many other visualization forms—you can explore other forms in the Kibana documentation. You can also add the two remaining indexes, mentioned in Step 2, into Kibana to be able to visualize them as well.

In this step, you have learned how to visualize some of the PostgreSQL statistical data, using Kibana.

Step 5 — (Optional) Benchmarking Using pgbench

If you haven’t yet worked in your database outside of this tutorial, you can complete this step to create more interesting visualizations by using pgbench to benchmark your database. pgbench will run the same SQL commands over and over, simulating real-world database use by an actual client.

You’ll first need to install pgbench by running the following command:

  • sudo apt install postgresql-contrib -y

Because pgbench will insert and update test data, you’ll need to create a separate database for it. To do so, head over to the Users & Databases tab in the Control Panel of your managed database, and scroll down to the Databases section. Type in pgbench as the name of the new database, and then press on Save. You’ll pass this name, as well as the host, port, and username information to pgbench.

Accessing Databases section in DO control panel

Before actually running pgbench, you’ll need to run it with the -i flag to initialize its database:

  • pgbench -h host -p port -U username -i pgbench

You’ll need to replace host with your host address, port with the port to which you can connect to your database, and username with the database user username. You can find all these values in the Control Panel of your managed database.

Notice that pgbench does not have a password argument; instead, you’ll be asked for it every time you run it.

The output will look like the following:

Output
NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping creating tables... 100000 of 100000 tuples (100%) done (elapsed 0.16 s, remaining 0.00 s) vacuum... set primary keys... done.

pgbench created four tables, which it will use for benchmarking, and populated them with some example rows. You’ll now be able to run benchmarks.

The two most important arguments that limit for how long the benchmark will run are -t, which specifies the number of transactions to complete, and -T, which defines for how many seconds the benchmark should run. These two options are mutually exclusive. At the end of each benchmark, you’ll receive statistics, such as the number of transactions per second (tps).

Now, start a benchmark that will last for 30 seconds by running the following command:

  • pgbench -h host -p port -U username pgbench -T 30

The output will look like:

Output
starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 30 s number of transactions actually processed: 7602 latency average = 3.947 ms tps = 253.382298 (including connections establishing) tps = 253.535257 (excluding connections establishing)

In this output, you see the general info about the benchmark, such as the total number of transactions executed. The effect of these benchmarks is that the statistics Logstash ships to Elasticsearch will reflect that number, which will in turn make visualizations in Kibana more interesting and closer to real-world graphs. You can run the preceding command a few more times, and possibly alter the duration.

When you are done, head over to Kibana and press on Refresh in the upper right corner. You’ll now see a different line than before, which shows the number of INSERTs. Feel free to change the time range of the data shown by changing the values in the picker positioned above the refresh button. Here is how the graph may look after multiple benchmarks of varying duration:

Kibana - Visualization After Benchmarks

You’ve used pgbench to benchmark your database, and evaluated the resulting graphs in Kibana.

Conclusion

You now have the Elastic stack installed on your server and configured to pull statistics data from your managed PostgreSQL database on a regular basis. You can analyze and visualize the data using Kibana, or some other suitable software, which will help you gather valuable insights and real-world correlations into how your database is performing.

For more information about what you can do with your PostgreSQL Managed Database, visit the product docs.

0 Comments

Creative Commons License