This blog is based on a webinar given by Dr. Dana Van Aken, OtterTune’s CTO, on February 15, 2023. You can check out the full webinar here.
I’ve been working on machine learning-based automatic database tuning techniques since 2015, when I started my doctoral studies at Carnegie Mellon University (CMU). Andy Pavlo, OtterTune CEO, was my Ph.D. advisor, and Bohan Zhang, OtterTune Co-founder, was a master’s student that worked with us on the project. We published our first research paper on OtterTune at SIGMOD, one of the big academic conferences for databases, in 2017. AWS invited us to write a guest blog post for their Machine Learning Blog a few months later. This got our work the kind of attention you don’t typically get at academic conferences. In the summer of 2020, we decided to commercialize OtterTune after receiving several inquiries from industry folks interested in trying it. Shortly after founding the company, I completed my Ph.D. in Databaseology and have been full-time at OtterTune ever since.
In 2022, we launched a bunch of new features that we called OtterTune v1.5. At the end of March, we’re releasing OtterTune v2.0. Here’s more on where we came from, where we are now, and where we’re headed in building an AI-optimized database tuning platform.
Current State, Early 2023
The original OtterTune research project focused only on configuration tuning. Specifically, we developed machine learning techniques to automatically tune a database’s configuration parameters (aka “knobs”) for whatever target objective the user wishes to optimize for, like reducing latency or costs.
To put how important this is in context, we’ve found that about 30% of the production databases use default configuration parameters, which are set to be compatible with the minimum hardware requirements of MySQL/Postgres and too conservative for most modern systems. If you tune your database, you can improve performance and save money, but it’s a tedious task that requires in-house expertise or taking the time to do it yourself. Hiring a database administrator is prohibitively expensive for many companies, so other technical employees—engineers or DevOps—get stuck with the responsibility. Without formal training or previous experience, they rely on documentation and blog posts to tune their databases.
Database performance tuning covers a variety of activities, not just configuration tuning, so we expanded the service to provide a more holistic database tuning experience for our customers. As part of OtterTune v1.5, added health checks to help tune indexes and identify other problems related to resource utilization, cache management, and general database efficiency.
How OtterTune works
OtterTune connects to your database. We support AWS RDS and Aurora, including serverless, for Postgres and MySQL.
Step 1: Permissioning- The user grants OtterTune certain AWS IAM permissions so we can collect metadata from RDS about your database instances, and metrics from Cloudwatch and Performance Insights.
Step 2: Set up telemetry collection- OtterTune supplies an open-source agent that can be deployed in your infrastructure that’s able to connect directly to your database. This lets us collect telemetry, which includes internal runtime metrics, query summaries, etc.—the data that OtterTune uses to tune your database.
Step 3: Start the tuning process- The agent receives telemetry data from the database and AWS from the standard APIs, storing that data in our central repository.
Step 4: Training the model- The compute engine begins training models to predict database behavior using the data it just collected, plus all of the previous data from the database.
Step 5: Generating a better config- OtterTune generates a config that should improve performance based on what it’s seen in the past. It also uses the performance data to generate health check recommendations, which show up in the OtterTune interface.
As you continue using OtterTune and it observes more data, your configs improve. Once a new config is available, it gets passed back to the agent, which then applies it to the database. There are a few different ways to do this. There’s a fully manual option where you can update the recommended configuration parameters in the database yourself. Then, there’s a fully autonomous setting, where OtterTune continues to apply new configs on behalf of the user. Finally, there’s a human-in-the-loop option, where OtterTune prompts the user to review and approve the config before applying it. We also periodically generate health checks, with the understanding that this is an ongoing process, and failures do occur over time.
OtterTune v1.5 Features
Health checks automatically detect issues in your database and provide recommendations on how to fix the problem. Unlike the config applications, health checks are recommended by OtterTune and users elect to apply them. We provide them at a few different levels: database, index, table, and autovacuum.
Exactly which health checks you see depends on your database, but the goal of all of them is to catch the hidden issues that are degrading performance and costing unnecessary usage fees, including storage and instance type.
Database checks cover database admin issues, including resource utilization, cache management, and database efficiency. (Check out this clip to see the most database failed checks.)
Index checks look at unused indexes and duplicate indexes so you can free up disk space. (See the percentage of failures of these checks here.)
Table checks analyze up to a thousand tables in your database. For MySQL, it checks the fragmentation ratio and gives recommendations you can use to improve performance and reduce wasted storage. For Postgres, we have three checks: dead tuples, stale statistics, and excessive reads. All of these checks are aimed at dealing with the side effects of Postgres quirks, like how it handles versioning, to make sure performance stays high and you’re not wasting a bunch of storage space. (For more context, jump to the Table Checks section of the talk.)
Autovacuum checks, for Postgres only, check stuck autovacuum and excessive scans to indicate if there are active scans in progress and whether they’re able to succeed.
Less manual, more deployment options, and extended data collection.
New deployment option for Terraform: The first step in our onboarding process is creating the IAM role to connect your account to OtterTune so we can get the data we need from AWS. We added a new deployment option for Terraform.
Better UX for connecting your AWS account to OtterTune: We’ve also made it easier to create the AWS IAM role for OtterTune. Previously, it was a bit time consuming and manual, with a lot of copy-and-pasting back and forth. Now, you press “Connect,” switch over to the AWS CloudFormation and accept all permissioning, and create your stack. Once you’re done, the resource ID is automatically grabbed and stored. There’s much less effort required or room for human error.
Improvements to the open-source agent: We’ve added more deployment methods: Kubernetes manifest and Helm chart. We’re excited about this and have seen a lot of usage.
To support the latest index- and query-tuning features, our agent collects more data including schema information, index/table statistics, and anonymized query digests (all of this is configurable if you don’t want us to).
We also added support for collecting data from multiple logical databases for Postgres. We’ve always supported this for MySQL, but for Postgres, we had a limitation where we could only collect data from one logical database. Now we can collect from more.
Account management and security improvements
Better self-service, a Slackbot, and security compliance certification
OtterTune Slackbot: OtterTune’s Slackbot will send you a message when there’s a new config ready for your review (human-in-the-loop config application only); when OtterTune disables tuning (this happens if there’s a major change in your database hardware and OtterTune isn’t sure that the previous data will apply in future state); and when OtterTune applies a new configuration, plus the before and after values.
Add members to your team account: On the admin side, you can invite additional team members to join your OtterTune account, either new or existing users.
Better subscription management: You can manage your plan and subscriptions from the OtterTune interface. We’re also on the AWS marketplace, where you can pay in AWS credits.
Feature requests: You can tell us what you want and how badly you want it! We do check it and use it to guide future feature development. You can add a feature or vote here.
Support for multiple AWS accounts per one OtterTune account: You can add multiple AWS accounts to one OtterTune account and switch between accounts.
SOC 2 Type II Certification: We take security very seriously. We touch critical data and systems and know how important it is to our users that we continue to make a good-faith effort to show we don’t take that responsibility lightly.
Future state: What’s coming, March 2023
At the end of March, we’re launching OtterTune v2.0. It’s a massive launch, including a complete redesign and a bunch of new features.
Database Score: We analyze database health across a series of metrics, not just your target performance optimization metric. We distill it down to one score representative of your overall database health.
Query tuning: This was a huge request from customers. More to come soon!
Unified ML and health check recommendations: We’ve combined the previously-siloed ML config tuning and health check config features. The unified recommendation system incorporates both of these techniques when deciding which config to try next.
Better UX: Our redesign is focused on making OtterTune easier to use for devs. The joke is, we built the initial UX around what we wanted as graduate students—complex, highly flexible, with a lot of different tuning options. For v2.0, we simplified the tuning options and reworked the dashboard so you’ll see the most important recommendations and issues for your databases. We didn’t take away any functionality, we just brought the most important things you use all the time closer to the surface.
Stay in the loop
We’re launching OtterTune v2.0 at the end of March. We think it’s both an improvement for our product and sets the bar higher for what AI can do for databases. Follow along on Twitter and LinkedIn for more info on the launch, and tell us what you think on our Slack Community.
To learn more about our release, check out our documentation.