Excessive disk space usage by postgres due to monitoring table

We have observed that too many updates for monitoring data might cause disk usage bloat causing `/var/cfengine/state/pg` contents to grow unexpectedly.

To check for actual disk space usage by all postgres tables, please run the attached SQL as follows:

/var/cfengine/bin/psql cfdb < db-disk-usage.sql

You will most likely see the "__monitoringmg" table on the top, please proceed with the Immediate recovery steps if that is the case. If other tables (in rare cases) are using up excessive disk space, please contact us for further assistance.

Immediate recovery

For immediate recovery, we recommend the following:

  • `service cfengine3 stop`
  • Free up some space in the partition where `/var/cfengine/state/pg` resides
  • Start only postgresql with:
cd /tmp && su cfpostgres -c "/var/cfengine/bin/pg_ctl -w -D /var/cfengine/state/pg/data -l /var/log/postgresql.log start"
Remove monitoring data:
echo "TRUNCATE __monitoringmg" | /var/cfengine/bin/psql cfdb
You can now kill postgres processes and start CFEngine normally (e.g. using init.d).
  • Disable monitoring data. In masterfiles/lib/3.6/reports.cf, body report_data_select default_data_select_host and default_data_select_hub, empty monitoring_include:
monitoring_include => { "" };

Note: This will disable monitoring data from all clients. To selectively include/exclude monitoring data collection, we need to create a separate report_data_select body: https://docs.cfengine.com/docs/3.6/reference-promise-types-access.html#report_data_select

Please let us know if you need additional help with this.

  • Automatically truncate monitoring table using policy. To be sure that the monitoring table does not fill up the disk again, we add the following policy in masterfiles, e.g. in promises.cf and hook it in bundlesequence. It will only run daily on the hub and should complete within seconds when it runs.

bundle agent cfdb_truncate_monitoring
{
vars:
  "monitoring_tables" slist => { "__monitoringmg", "__monitoringyr", "__monitoringhg" };

commands:

policy_server::
  "/bin/echo \"TRUNCATE $(monitoring_tables)\" | /var/cfengine/bin/psql cfdb"
    action => if_elapsed("1440"),
    contain => in_shell,
    handle => "truncate_$(monitoring_tables)";

}

  • service cfengine3 start

Next steps and recommendations

       Send us the following information:

  1. What is the disk type and filesystem on / partition?
  2. More logs would be helpful:
    • /var/log/postgresql.log
    • /var/log/messages or syslog
    • /var/cfengine/outputs

       Recommendations:   

  1. We recommend increasing the disk space (maybe double), we will come up with more accurate results after more tests and feedback from users.
  2. (Optional) We recommend mounting /var/cfengine/state/pg in a separate disk if possible (preferably in a faster disk) for better performance and predicatability of disk usage patterns. If you decide to do so, please let us know if you need help migrating the existing installation to a separate partition.

 

Have more questions? Submit a request

Comments

Powered by Zendesk