- Lance Johnson 12.04.2014
As developers on a LAMP stack, we work with MySQL databases all the time. Sometimes we are building out the database architecture ourselves, sometimes we are working with standard CMS database configurations, and sometimes we inherit our clients' legacy databases that were built by a previous developer. Working with databases so often, we have learned a lot - sometimes the hard way - and so we put together this list of 8 riskiest things you can do to with your data to maximize the chance of data loss.
1. Don't plan your database structure
One of the surest ways to cause yourself and other developers heartache is to neglect to plan your database structure. After all, when starting a new project or adding new features, it is so easy to just start building. If you are happy with inconsistent naming conventions, duplicated content, and unintelligible table schemas, then by all means start building your database tables before you've considered how they will all work together.
How Green Egg Media avoids this: We've felt the pain on this one before. As a new company in 2008, we were often so enthusiastic to be building something that we just started creating database tables whenever we thought we needed them. After a few of our earliest projects where a lack of planning caused countless hours of re-work, though, we decided it was well worth it to spend some extra time at the start of every project to plan out exactly what the database tables would look like. This allowed us to see problems with duplicated content much earlier, and allowed us to employ a DRY methodology much more effectively. Before the first database table is created in a custom project or added to a CMS, we now create a complete architecture to show the name of each table, the structure of those tables, and any relationships that exist between them.
2. Fail to define indexes and/or keys
Indexing your database tables might not seem like a big deal at first, and if you're happy to perform a full scan and read tables row-by-rown for every query, then by all mean, you can ignore indexes. Then, when your database tables grow to hundreds, thousands, or even millions of rows and you're running in a production environment where you're processing mutiple requests every second, you can be sure that your database will get slower and slower until it just stops responding altogether. Problem solved.
How Green Egg Media avoids this: Indexing actually costs a little bit more for every write operation you perform to an indexed table, but the cost savings on data retrival is almost always worth it, especially when row counts are very high. When we are defining our databases structures (see #1 above), we determine which columns should be indexed to ensure that we can optimize our data retrival down the road. Of course, we always ensure that all of our tables have a unique key which can be used to define relationships and eliminate a lot of duplicated content. Need help understanding indexes? Check out this quick and easy tutorial on MySQL Indexes.
3. Ignore backup routines
You've heard it before - backup your data! If you like living life on the edge, though, feel free to continue waiting for "mañana" to start backing things up. Eventually you'll try to install an add-on or upgrade your CMS just to have the process fail somewhere in the middle after half of your database tables have been updated, while the other half haven't. And when you try to run the process again, all you get are errors about missing columns, failed databse forges, and already existing keys. But that's OK, because you can always spend the next 20 hours that your site is offline trying to manually undo those changes and figure out just what went wrong before casually hitting that "Start Upgrade" button again.
How Green Egg Media avoids this: There is really nothing sexy about a backup... until you need it, and we promise that the time will eventually come when you need one. We have put several backup systems in place to protect our clients' data, including automated server-based backups, pre-upgrade backups, SQL dumps that are assigned to version control commits, and more. No matter how careful we are while we're developing, there are so many external factors, that we have at least one time each year that we need to roll back to a previous version of one of the databases that we manage. Our storehouse of backups sure is pretty sexy then.
4. Don't check for errors
I bet a lot of younger developers don't even realize that there is a MySQL error log. That's fine if you don't care when your queries aren't working the way they should. Not all MySQL errors result in massive error messages that flash up on your screen, particularly if you are suppressing errors because you don't want users in your production environment to see them. Failing to regularly review your MySQL error log is one sure-fire way to end up with a database full of bad data or missing expected data. If you let your database run long enough with fixing these errors, then you'll have the added advantage of missing so much good data that your database has become almost unusable. Nice job!
How Green Egg Media avoids this: Not every single thing that appears in the MySQL error log requires immediate attention, but some things do. We conduct regular reviews of all of our clients' MySQL error logs and ensure than any errors that are problematic are addressed. While we're reviewing the MySQL logs, we also take a look at the PHP and Apache logs to make sure that the data we are writing and reading from the database is working as expected with the rest of the systems. Sometimes it can take a little detective work to track down some of the peskier problems that manifest themselves, but it's always worth it in the end.
5. Always load the biggest dataset
Make SELECT * FROM my_table; your best friend. Nevermind that my_table has 250,000 rows with 30 columns in each row. The more the merrier! Right? If you, your clients, and your users enjoy slow pageload times, then sure, keep up the good work. So many developers are in the habit these days of loading everything they could ever posisbly need with their queries. Nevermind that they end up accessing only 1% of what they have actually queried from the database. Oh, and caching query results? Don't worry about that - your server is probably fast enough and has enough memory that it can totally load everything on every single page load.
How Green Egg Media avoids this: When loading data, we write optimized queries to ensure that we are loading only the data that we actually need. This is much easier to do becasue we've already planned out the data structures and defined the keys and indexes we know we're going to need. We also enable caching whenever posisble to avoid loading the same data sets over and over again. Even optimized queries can add unneccesary strain to the database server if they are called over and over again in rapid succession. Sometimes we get stuck with third-party systems where we can't (or shouldn't) re-write the database queries, so in this case, we rely particularly heavily on other caching or lazy loading mechanisms to reduce the strain on the database server as much as possible.
6. Don't document your code
How Green Egg Media avoids this: We believe that code worth writing is code worth documenting. We don't advocate writing comments for every line of code that we write, but we do make sure that our code is documented well enough that when we come back to the code 12 months later, or when we hand it off to our project partners, everyone can understand what the code is supposed to do.
7. Use cheap servers
Why would anyone ever pay more than $1.99 a month for web hosting? OK, maybe you can justify $3.99 a month for a little more storage space, but anything more than that would just be a waste of money, right? That's absolutely true if you don't care about the other 1,000 websites that are being hosted on that same server, using the same resources you're using. A single unoptimized database query on one of the other tenants on your shared server can occupy MySQL processes and slow down your website - and you have absolutely no control over it. But that's OK, because you like life in the fast lane - without a seatbelt.
How Green Egg Media avoids this: We never deploy our clients' websites on cheap, shared hosting environments. When we provide hosting for our clients, we always deploy them on servers over which we have complete control, some of which even utilize solid state drives for maximum speed. For some of our clients who are doing more resource intensive tasks, we put them on their own fully managed VPS. Not every client needs a $150 / month VPS of their own, though, so we sometimes put several of our clients on the same server that we control and manage. Since we control every piece of code that runs on that server, though, we don't have to worry about a rogue tenant impacting the stability of our database servers. While the cost of our hosting solutions is more expensive, we deliver a superior product that is designed with speed and efficiency in mind.
8. Fail to optimize your server environment
OK, so you've got your own private server, with plenty of RAM, a fast disk, and ample storage. That's enough, right? Web and database servers have thousands of configuration options, but you're content to just load up the default configuration and let it ride. For very small websites, this might actually be fine, but failure to review, understand and optimize your server's configuration will almost always lead to hitting resource limits at inopportune moments. Having 32GB of RAM on your high-powered database server is useless if your configuration is limiting processes to 32MB of RAM. At the same time, if you find you need to set a 16GB limit for your processes, you've probably got some serious optimization problems on your hands. But that's OK, because your database server will probably crash pretty quickly, and you can just relax knowing that no more errors are being generated while the server is down.
How Green Egg Media avoids this: We understand how to configure and optimize Linux servers to maximize performance for the systems we build for our clients. We operate dozens of different configurations for our clients because we know that one size doesn't fit all. Some of our clients really only need the basics, so we strip out all of the things that they don't need to really optimize their processes. At the end of the day, we strive to ensure that all of our client websites are running on servers that have been fully optimized with their codebase in mind.
Everyone makes mistakes at some point - just make sure your mistake doesn't cost you your data or your client's data. If you understand (and avoid) these eight fatal pitfalls, you'll be well on your way to database integrity.
At Green Egg Media, we pride ourselves on delivering the best possible experience for all of our clients. We want to really know our clients and understand their needs. We know that our clients rely on their websites for their businesses. If you are struggling with any of the issues we've explored above, we would love to help you. Just get in touch with us, and we'll get back to you as soon as possible to understand what you need.
AWS Elastic Load Balancers communicate with EC2 instances over standard HTTP, which can cause issues when WordPress is setup to use SSL. A single line of code in wp-config.php can fix this problem.