In large tables, advanced indexes on several columns may end up occupying several GB of disk. Since storage comes with a cost, this side effect should be taken into account. This is also why you should periodically drop indexes that you do not use or need. Plus, defining an index on a million-row table can take minutes. Therefore, when dealing with a database with billions of rows, you should not create indexes too lightheartedly.
3. Do Not Rely on Backups
When anUPDATE
orDELETE
query goes wrong, it can be a serious problem for your business. Data is money, and losing or degrading the quality of your data means losing money. For this reason, you should always perform data backups. After all, data recovery is one of the most important aspects ofdisaster recovery. This is also why most popular database providers offer data recovery features.
At the same time, restoring data from a backup takes time. With small databases, this generally takes up to a few minutes. On the contrary, with billion-record databases, it usually takes up to a few days. If your business can survive a few minutes of downtime, it is unlikely that it make it through a few days offline or with degraded services. Thus, with a database with billions of rows, you should not rely too heavily on backups. Instead, you have to pay close attention to every write query you launch and need to know what you are doing.
Also, you should look for different ways to back up and restore data. Relying solely on the most common backup techniques or what your database provider offers you may not be the best approach. For example, theMySQL’s LOAD DATA INFILEstatement allows you to read rows from a text file into a table at a very high speed. So, you should also consider exporting table data to simple text files as backups.
4. Optimize Your Queries
Spending time writing optimized query when dealing with small databases is not too important. Instead, when working with a billion-row database, it becomes essential. A poorly written query can take several seconds or even minutes. This may result in unacceptable performance for end users.
Also, considering that you should create indexes sparingly when dealing with such a large database, you must be sure to be taking advantage of them. You cannot simply write the first query to extract the data you want that comes to your mind. You need to know what the query will do, why, and what indexes will be used as a result.