- Lance Johnson 01.31.2011
Recently, I had need to write a small web app that would process hundreds of thousands of lines from a tab-delimited file into a database and run some logic against the data once a day. I immediately turned to CodeIgniter, knowing that I could rely on its speed, power, and flexibility. During the earliest tests, I was only processing the first thousand lines of the file and things were going swimmingly. The time came, though, when I was ready to run a test against the entire file, a file which contained over 335,000 lines. I had written some logic utilizing CodeIgniter’s Database Forge class to dynamically create the database tables and table structure based on information passed in the first few lines of the file, so I dropped my tables, reset my testing limits, and called up my controller.
Things seemed to be working well for the first 5 minutes. I watched as the database grew by a thousand records every few seconds. At the current rate, I calculated, it should take anywhere from 12-18 minutes to parse the entire file and return my results. But at about 9 minutes everything just stopped. The database had grown to around 150MB, and about 190,000 records, but that was all. None of my calculations had been run, no errors had been thrown by CI, and none of my debug logs had been displayed. It was clear that, for some reason, my application had just decided to stop. This was odd, as usually CI tells me why it stopped, but not this time. I checked my PHP error logs, and found what appeared to be a timeout error. Ah - an easy fix, thought I. I dropped in an ini_set() line at the top of my controller, cleared out the database, and let it fly again.
Things proceeded as before. As the database got closer and closer to 190,000 records, I held my breath. And… success? 191, 192, 193,000. It looked like things were working until suddenly everything stopped again around 195,000. There is no way I had hit the PHP timeout limit. The fact that the application had processed a few thousand more records suggested that the problem wasn’t with a particular line in the file. The PHP error log this time also didn’t have a timeout error, but there did appear to be a memory error. I was surprised at this. While the file I was processing was about 180MB, I was using fopen() and fgets() in order to avoid loading the entire file into memory. I was developing locally and already allocating 32MB of memory to PHP, so why was I running up against a memory limit? I decided to try setting a higher memory limit temporarily. I ramped up my allocated memory to 256MB. I wasn’t thrilled about this. A file parser that needs 256MB of RAM just to run? That’s not very efficient. And, it turns out, not only would it not be efficient, but it also didn’t work. I got to abut 240,000 records this time, but then I hit the memory limit again. What was going on?
Finally, I discovered what the problem was - CodeIgniter. Generally speaking, the documentation for CodeIgniter is excellent, but in this case, there was something going on that I wasn’t aware of, and that the documentation never mentions - Query Saving. This is not to be confused with query caching. That is a documented feature of CI, and one that I checked during my debugging. Query Saving is a feature of CI’s database class that stores the results of every query in memory until the controller is finished executing. As it turns out, in version 1.6.0, the ability to turn this off was added. The addition of the save_queries variable is listed in the Change Log, but as of the latest release of 2.0.0 last week, it still hasn’t made the documentation.
$this->db->save_queries = FALSE;
This single line of code in my controller allowed me to process my file, run my calculations, and generate the results I needed using only the base level of memory. For the most part, people probably don’t need to worry about whether or not CI is saving the queries, but in cases where you need to process huge amounts of data at the same time, it definitely becomes an issue. This can’t really be qualified as a memory “leak,” since CI is doing exactly what it was written to do. The question could be raised whether or not this is a required feature, though. I didn’t even know that CI was doing this, let alone that I could turn it off. It doesn’t make sense that the only reference to this variable should be in the Change Log for version 1.6.0. This is definitely something that should make the documentation for the Database Class. But until then, hopefully anyone who might be experiencing a PHP memory issue with CodeIgniter and MySQL will find this post and a solution to their problem.