Jul 7, 2021
How to prevent machine learning from wrecking your database
Dana Van Aken
Companies are moving their databases to hosted database services, such as Amazon RDS, to take advantage of the flexibility and scalability of cloud deployments. But what people do not often realize is that while these “fully managed” services simplify database provisioning and maintenance, customers are still responsible for some critical administrative tasks that are not easily automated. Database configuration tuning is one of them.
Optimizing a database’s configuration knobs for the workload speeds up performance and scales your systems, thereby lowering costs. Amazon already tweaks a few of MySQL and PostgreSQL out-of-the-box default settings for its RDS deployments based on the instance type. However, they do not tune them for any specific workload.
OtterTune automates the process of optimizing a database’s configuration knobs for its specific workload by using machine learning (ML). We showed in a previous post that OtterTune achieves a 3x performance improvement for a PostgreSQL database by tuning its configuration for its workload versus using the default configuration from Amazon RDS.
The OtterTune ML algorithms make decisions based on their training data, and sometimes these decisions can be unpredictable because the algorithms do not interpret the world the same way humans do. For this reason, OtterTune has safeguards in place to ensure that ML does not harm your database.
We have learned a lot about how an ML-based DBMS tuner fits an organization’s established database administration policies and practices. These experiences have led us to develop operational settings that allow humans to control aspects of the automated tuning process. This article discusses which configuration knobs OtterTune can tune and the features available to users for managing this.
Types of knobs
A DBMS has hundreds of configuration knobs that control aspects of its runtime operations. These knobs typically fall into one of the following categories:
Knobs in this category specify how much of a resource the system uses for a task. These can be either for fixed components (e.g., number of vacuum workers) or for dynamic activities (e.g., amount of memory to use per query).
Examples of resource knobs for MySQL and PostgreSQL:
DBMS = MySQL
Knob = innodb_buffer_pool_size (fixed component)
Description = The size in bytes of the buffer pool, the memory area where InnoDB caches table and index data.
DBMS = PostgreSQL
Knob = work_mem (dynamic activity)
Description = Sets the base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files.
These configuration knobs control how the DBMS behaves for specific tasks. For example, a knob can control whether the DBMS flushes the write-ahead log to disk when a transaction commits.
Examples of policy knobs for MySQL and PostgreSQL:
DBMS = MySQL
Knob = innodb_flush_method
Description = Defines the method used to flush data to InnoDB data files and log files, which can affect I/O throughput.
DBMS = PostgreSQL
Knob = wal_compression
Description = When this parameter is on, the PostgreSQL server compresses a full page image written to WAL.
Lastly, location knobs specify where the DBMS finds resources that it needs (e.g., file paths) and how it interacts with the outside world (e.g., network port number).
Examples of location knobs for MySQL and PostgreSQL:
DBMS = MySQL
Knob = port
Description = The number of the port on which the server listens for TCP/IP connections.
DBMS = PostgreSQL
Knob = config_file
Description = Specifies the main server configuration file (customarily called postgresql.conf).
Paws off! Controlling which knobs OtterTune configures
We allow OtterTune to tune most knobs that control resources and policies. However, there are a few knobs that can have negative consequences like potentially causing the DBMS to lose data. We do not let OtterTune tune these knobs.
These restrictions are to avoid exogenous problems that are not readily measurable by the service. For example, the algorithms will learn that turning off disk writes for transaction commits makes the DBMS run faster. But now, if the system crashes, it might lose data because it no longer flushes data to the disk. The ML algorithm has no way of knowing whether that is a problem because that is an external cost that it cannot measure. A human has to make that value judgment. Thus, we disallow OtterTune from making those types of changes to a DBMS’s configuration.
OtterTune will not try to tune your port number or any other location knob for obvious reasons.
Examples of disallowed knobs for MySQL:
Knob = innodb_doublewrite
Why it's disallowed = Controls whether the doublewrite buffer is enabled. You might consider disabling the doublewrite buffer if you are more concerned with performance than data integrity, as may be the case when performing benchmarks, for example.
Knob = innodb_flush_at_trx_commit
Why it's disallowed = Controls the balance between strict ACID compliance for commit operations and higher performance that is possible when commit-related I/O operations are rearranged and done in batches. You can achieve better performance by changing the default value but then you can lose transactions in a crash.
Examples of disallowed knobs for PostgreSQL:
Knob = fsync
Why it's disallowed = If this parameter is on, the PostgreSQL server will try to make sure that updates are physically written to disk, by issuing fsync() system calls. While turning off fsync is often a performance benefit, this can result in unrecoverable data corruption in the event of a power failure or system crash
Knob = full_page_writes
Why it's disallowed = When this parameter is on, the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint. Turning this parameter off speeds normal operation, but might lead to either unrecoverable data corruption, or silent data corruption, after a system failure.
Knob value ranges are controlled, too
OtterTune also suggests a permitted range of values for each knob. This range is important for knobs that allocate system resources since setting a knob larger than what’s available can cause the database to crash. OtterTune collects hardware information from the user’s deployment and computes the suggested ranges based on the amount of a resource available.
OtterTune limits the range of some policy knobs as well. For example, many knobs that specify timeouts are safe to tune but only within a specific range.
Users can configure which knobs to tune and their ranges when connecting a database to the service for the first time. Once setup is complete, users can modify their settings from the options panel.