Mar 9, 2022

Ten database commandments

Andy Pavlo

dj ot with notorious big
dj ot with notorious big

Today is a sorrowful day for me because it marks the 25-year anniversary of The Notorious B.I.G.‘s murder. The seminal hip-hop artist was shot four times in a drive-by shooting in Los Angeles on March 9th, 1997. He was only 24 years old. This was a rough time in hip hop because Biggie was killed just six months after Tupac was shot in Las Vegas.

One of Biggie’s most popular songs (and probably my favorite) is Ten Crack Commandments, a step-by-step guide to achieving success in the game. The song is based on Biggie’s experience in slinging rock at age 12 on Fulton Street in Brooklyn. Biggie spits fire over the beats produced by DJ Premier. Ten Crack Commandments is on Biggie’s final studio album, Life After Death, released 16 days after his death.

At OtterTune, we take the database hustle seriously — our goal is to make sure your database is running smoothly to keep your trap online no matter if you’re doing operations or analytics. In memory of Biggie, we now present our 10 street rules for databases. You can listen to the 2007 remastered version as you read (or check out this excellent 2013 remix by DJ Finesse).

Rule #1: Never let no one know how much dough you hold

Database: Never let a database vendor know what your budget is.

The list prices for databases are never what you end up paying. There are always discounts to be had if you push. But if a vendor knows about your bankroll for a database project, their salespeople will be less inclined to cut you a price break since they know you can afford it. There are plenty of stories of people getting shivved by their database vendors.

In 2018, reports said that Oracle was threatening customers with expensive license audits unless they signed up for Oracle’s cloud services. Nobody said that Larry Ellison bought his Hawaiian island by being your friend.

Amazon RDS Reserved Instances give you the option to reserve a DB instance for a one- or three-year term and, in turn, receive a significant discount compared to the On-Demand Instance pricing. Reserved Instances can save you up to 69% over On-Demand rates, so do yourself a favor – don’t flush your dough down the toilet.

Rule #2: Never let 'em know your next move

Database: Never let the man know what database tool you are going to use next.

People write software tools because they want to solve a problem that they have. There are many new tools that people are turning out on the street (“Show HN” is a good place to start). The ones that I find the most interesting these days (other than OtterTune) are dbt, Datasette, and Ibis.

If you can’t solve a problem using your organization’s existing tooling, then give something new a try. You don’t need to tell your boss or manager that you are doing this. It is better to ask for forgiveness than permission. Just be careful about copying or exposing data outside your organization so that you don’t get popped.

Rule #3: Never trust nobody

Database: Never give permissions to developers who don’t need direct access to the database.

There are many ways to screw up your database. You don’t want randos at your organization logging in and running queries that they found on Stack Overflow. Or worse, they could access sensitive data that they aren’t supposed to and expose your organization to costly data breaches.

I’ve been one of those randos. At my first programming job when I was 17 years old in high school. Back then, my life was a mess. I got fired from a chicken restaurant because my friend Crazy Eddie accidentally played audio from an adult film on the building’s speaker system, and some customers complained.

To keep me out of jail, I got a job programming websites using old-school PHP3 sites backed by MySQL ISAM. I wrote a DELETE query to clean up some erroneous data from our production database. But my query did not have a WHERE clause. I ended up blowing away the entire table.

All the major DBMSs have sophisticated security measures to restrict access and operations to tables. As of 2022, the Postgres security protocols are more robust than the MySQL protocols since Postgres supports row-level security. Make sure you use these properly to protect your neck.

Rule #4: Never get high on your own supply

Database: Never store a DBMS’s monitoring data back in the same DBMS.

Database systems generate a lot of metrics about themselves. This information is helpful to monitor and understand the system’s runtime behavior. For example, the Postgres pg_stat_all_tables view tracks the number of tuples inserted, updated, and deleted in each table. Likewise, the MySQL performance schema keeps counters for similar telemetry.

OtterTune uses these metrics to identify workload patterns and recommend optimized knob configurations when you enable its optimization mode on your database. Starting this week, OtterTune also continuously analyzes these metrics for its automated health checks to ensure that your database is running properly.

All of this data is great, and it helps you understand why the DBMS does what it does (or does not). You should turn on performance monitoring.

What you should not do is store this data back in the same DBMS instance that generated it. Monitoring data is less important than application data. So why would you overburden your production DBMS with it? Use a separate DBMS to store this information or offload it to third-party services (e.g., DataDog) designed for this kind of telemetry.

Rule #5: Never sell no crack where you rest at

Database: Never run your app on the same server as your DBMS.

A DBMS has better and more stable performance when it doesn’t incur interference from external factors. Such factors are when the DBMS has to contend with other applications for hardware resources. The OS doesn’t know that DBMSs are the most important thing in the world, and therefore it strives to treat each process that is running on it fairly.

One of the craziest setups I’ve seen was an application where the web server would SSH into the DBMS box and start a terminal process to interact with the database every time a customer logged into the site. And no surprise, when their site had 1000s of active users, each with their own process running on the DBMS machine, the DBMS performance was unstable.

Luckily, Amazon RDS MySQL and PostgreSQL instances run in their own VM controlled by Amazon, preventing people from doing this.

Rule #6: That goddamn credit, dead it. You think a crackhead paying you back, shit forget it!

Database: Never let DBMS tuning eat up all your time because you won’t get it back.

Diagnosing DBMS performance issues is a daunting task. If you’re a developer with DevOps responsibilities that require you to take care of your organization’s databases, you can spend a lot of your time searching the web and forums looking for tuning advice. You’ll find plenty of blog articles that give general advice. Most of them are correct, but we sometimes come across ones that suggest changes that are incorrect for some workloads. You can get sucked down a tuning rabbit hole, and you’ll never get that time back. You are better off working on your application to add new features or fix bugs.

There are some automated tools to identify performance DBMS problems. For example, Amazon announced their DevOps Guru service in December 2021 that can automatically detect and diagnose performance in RDS. An even better idea is to use an automated DBMS optimization service like OtterTune to handle the dirty tuning stuff for MySQL and Postgres.

Rule #7: Keep your family and business completely separated

Database: Never run too many database tenants on the same DBMS instance.

Hosting too many databases on a single DBMS instance can affect performance, especially if they support write-heavy applications. Each database is contending for hardware resources, and the DBMS may accommodate all requests equally.

Unlike “enterprise” DBMSs, like Oracle, MSSQL, and DB2, the two most popular open-source DBMSs, MySQL and Postgres, do not have built-in control mechanisms to limit resources per tenant. The Postgres extension pg_cgroups can control the DBMS workers, but you cannot use it in RDS. Without such controls, the DBMS treats each tenant and their queries equally. This means that if one application’s database hogs all the CPU and I/O, then there is no way to prevent it from affecting other databases hosted on the same DBMS instance.

If you are maintaining multiple databases on a single DBMS instance now, you should make sure you use a proxy in front of the DBMS. The proxy will make it (potentially) easier for you to migrate databases to separate DBMSs in the future without having to modify application code. If you’re running on AWS, then you should consider their RDS Proxy service.

Rule #8: Never keep no weight on you!

Database: Never weigh your database down with data jums.

A DBMS is a ravenous beast. It wants to consume all the CPU, memory, and disk IOPs it can get to run your queries. And when you give your system these resources it is never satisfied.

One way to get a temporary performance boost in your DBMS is to delay maintenance tasks. These maintenance tasks can vary from OS and DBMS version updates to patch security vulnerabilities or background operations that touch the physical database (e.g., checkpoints, background writer, ANALYZE, auto-vacuum). The former avoids having to restart the DBMS, while the latter can free up resources for queries.

But know that while your DBMS may run faster for a brief period, it is accumulating “system debt” and cruft over time. For example, Postgres’ multi-version scheme (MVCC) means that every time you update a tuple, it makes a copy of that tuple with the new changes instead of overwriting the original tuple. The auto-vacuum goes through the tables and removes these dead tuples. If you turn this off, your tables will continue to grow, and it will take longer to find data because queries have to look at a bunch of useless data to find what they want.

You will also eventually run out of storage space. Similarly, if you turn off the background writer, then the DBMS will not proactively write out dirty pages in the buffer pool, which could mean that when a query needs to fetch a new page, it has to wait until the system writes out a dirty one to make space.

Don’t mortgage your database’s future by delaying these maintenance tasks. AWS makes scheduling updates during maintenance windows really easy. OtterTune can also automatically tune the frequency of these background tasks for both MySQL and Postgres to get the right balance between performance and protection.

Rule #9: It you ain't gettin' bagged, stay the fuck from police

Database: Never talk to the police about your database.

If other programmers or DBAs think you are snitching, they will not try to listen to your excuses. They will be sitting in your kitchen, waiting to start hitting.

Rule #10: Consignment strictly for live men, not for freshmen

Database: Never over-provision your DBMS for traffic that you don’t have yet.

Cloud vendors like AWS are dangerous pushers. They provide you with nearly unlimited resources for your database. It only takes a few clicks to scale up your DBMS instance. And as I said in Rule #8, the DBMS will gladly use all the resources you give it. That is unless you don’t actually need it.

You need to be careful that you don’t overprovision your database. This means using an instance type that is too large, allocating too much storage space, or needlessly jacking up your provisioned IOPs. So unless you are getting high amounts of traffic that require this heat, you are wasting money (see our previous blog article on this problem). And AWS is going to come collect their money from you whether you use those resources or not.

Try OtterTune for free. Start using AI to optimize your PostgreSQL or MySQL databases running on Amazon RDS or Aurora.

Try OtterTune for free. Start using AI to optimize your PostgreSQL or MySQL databases running on Amazon RDS or Aurora.

Get Started

Subscribe to blog updates.