Feb 11, 2022
Databases, machine learning and Parisian riots
In our early experiments with OtterTune, we got great results in improving the performance of MySQL and PostgreSQL by optimizing their knob configurations. Before deciding to spin out the OtterTune research project into a start-up company, we wanted to prove that it could work on real-world applications.
After publishing a blog article with Amazon about our SIGMOD 2017 paper, several companies reached out to us and asked whether we could set up OtterTune for them. One of them was a director of a database team at Societe Generale (SG), a large French bank. SG was interested in deploying OtterTune on their on-prem databases that support their IT infrastructure. The director understood the challenges of database tuning and was interested in applying new technologies, like machine learning, to alleviate performance problems.
We were excited by this collaboration! Dana Van Aken and I flew to Paris in December 2018 to meet SG’s database team and discuss the deployment. Our visit coincided with the December 8th “yellow vest” protests in Paris around economic justice and political reform. We went out on the streets in support and ended up getting tear gassed by the Parisian riot police. Nobody ever said databases were easy.
This blog article contains additional background information about the deployment of OtterTune at SG.
SG database infrastructure
The deployment of OtterTune at SG was originally supposed to target Postgres databases. SG turned out to be primarily an Oracle shop, so we created a custom version of OtterTune to support Oracle. The commercial version of OtterTune does not support this yet. If you’re interested in OtterTune for Oracle (or any other DBMS other than PostgreSQL or MySQL), you can let us know here.
Updating OtterTune to support Oracle took a few months. It mostly required changes in its agent component that connects to the DBMS and retrieves metrics/knob information. Importantly, we did not need to change the machine learning (ML) algorithms. This is because OtterTune does not employ DBMS-specific optimizations in its underlying ML models.
At the time of our deployment, SG was running most of its database applications on Oracle (v12) on a virtualized private cloud infrastructure with non-local storage. They provide self-service provisioning for DBMS deployments that use a pre-tuned configuration based on a user-specified expected workload (e.g., OLTP vs. OLAP). The initial knob configuration for each Oracle instance is selected from a set of pre-tuned configurations that SG uses for its entire database fleet.
These Oracle deployments are managed by a team of skilled DBAs with experience in knob tuning, creating the ideal situation for making a comparison between the human DBA results and automatic tuning using ML.
Target database & workload
We tested OtterTune on a 1TB Oracle v12.1 database that supported an IT issue tracking application, similar to Jira or Github Issues. We created a snapshot of the TicketTracker database from its production server using the Oracle Recovery Manager tool. The total uncompressed size of the database on disk is ∼1.1 TB, of which 27% is table data, 19% is table indexes, and 54% is large objects (LOBs). This LOB data is notable because Oracle has knobs specifically for tuning BLOG memory pools.
Up to 45% improvement
We compared the quality of the configurations OtterTune generated against DBA-selected knobs. To keep the comparison fair, we accounted for issues like long-running and failed configurations, as well as Oracle’s background maintenance tasks.
We evaluated tuning 10, 20, and 40 knobs. Our experiments showed that 10 knobs turned out, on average, to produce the best results for this workload.
OtterTune can also automatically rank knobs based on how much impact they have on the DBMS’s behavior. This ranking allows OtterTune to prioritize tuning knobs that have the highest likelihood to improve performance. We found that OtterTune’s ranking algorithm performed as well as SG’s in-house Oracle experts’ rankings. For example, both OtterTune and the DBA targeted knobs related to the DBMS’s memory allocations and query optimization. We see a similar trend when OtterTune tunes corresponding knobs in PostgreSQL.
Our results showed that our ML algorithms could generate knob configurations that improve performance by up to 45% over ones generated by a human expert.
We’ve since deployed OtterTune in many real-world environments, with terrific results. The full details of the SG project are published here in VLDB 2021.
We’d like to thank Sebastian Brillard and Christian Bilien of SG for the great collaboration and support during this deployment.