Ajax call is significantly slower on live server

February 17, 2018 748 views
Django Server Optimization Ubuntu 16.04
Bungol
By:
Bungol

In my app, there are a lot of ajax calls. A user may do 50 ajax calls per session, so speed is very important. The ajax call queries the database and returns a list of records.

Doing the ajax call on my local development server takes about 400 ms. The identical ajax call on the live website takes anywhere between 1.2 - 1.8 seconds.

I checked the droplet console - there is no stress on the CPU, RAM or disk. Everything is operating at very low levels.

Why is it so much slower on the live server? What could be causing this? Everything else is the same, except the ajax call takes an extra second on the live server.

Also, this started happening only yesterday, after I added some additional records and made some minor changes to the code. Before yesterday, I did not notice any major speed differences between development and production.

I'm using Django 1.11 on Ubuntu 16.04 with Gunicorn & Nginx.

3 Answers

There are a couple of things to note. First off a 400ms database call, even on a local environment is actually a pretty slow and long query. You should really want these queries to be executing at less than 100ms and it's unclear if that is a single ajax call that is 400ms, or if its the combination of all ajax calls.

The second part is that if you are dealing with slow queries what you can do is turn on slow query log on your database which will create a log file of all slow queries so you can drill down deeper and see which specific queries are slow.

Often times when you have a mismatch like this it may indicate that you have indexing setup on your local environment, but you maybe missing the indexes on your table/column on the database which would cause a significant change in query execution time. So you want to check your database tables to see that you have the same indexes setup between your local and production environments.

The last part is you mentioned everything was fine until you made a code change. Nine out of ten times the code change is what caused the problem. If you aren't seeing any significant load in CPU or RAM, or any other system resources, and if the queries are always slow, as opposed to periodically slow, then you want to make sure that the code change isn't at fault. So revert back to before you made the changes and try again.

@moisey Thanks for your detailed response.

I did not know about indexes before. I just added them but it only improved the performance for my development server, the performance is still bad in production, in fact I think it is slightly worse now.

Ajax calls are just really inconsistent in production. When making a smaller query, it takes 500ms to 1.5 seconds. A larger query takes 5 - 20 seconds! Not joking, 20 seconds. The same ajax call in development takes about 500-700ms.

Here is a link to one of those super long ajax calls - https://www.bungol.ca/api/filter_closed_listings/?minLat=43.56211925192369&maxLat=43.78560617695566&minLng=-79.65518145178225&maxLng=-79.15736345861819&zoomLevel=12&sold=30&pType=0&bed=undefined&bath=undefined&low=0&high=0&includePandL=false&showActiveListings=y&showSoldListings=y&showEndedListings=n&transactionType=Sale. This one took 22 seconds for me.

If the problem was in the code, shouldn't the call times be similarly slow for both development and production? Yet it is only slow and inconsistent for production.

So then maybe it is the indexing setup. I just set it up yesterday, after I identified the performance discrepancy between development/production, so that seems unlikely.

Looking at the chart, there is abnormally high disk usage, even if it's just 2 or 3 people browsing the site at a time. Is this a normal amount of disk usage?

I'm a newb at this, but it seems to me something is wrong with the server. The code being executed by my development/production environments are identical.

  • A couple of things, if the server is under no real load, either dev or prod, then the queries should take about the same time to execute, each time that you run them. So if it took 500ms then it should take that amount of time the next time you run it. So if you are seeing variability with no load then that could be a potential droplet/server issue.

    In terms of difference in speed, it depends on what your local dev environment is, what else is running and how many resources you have on a production system, a bit difficult to compare apples to apples, but without knowing what your laptop/desktop specs are and what the load is before executing locally hard to say which should be faster.

    What you can do to check if it's a droplet/server issue is power off the server, create a snapshot, then create a new droplet from the snapshot and rerun the query. That way you can compare two different droplets running the same code and see if you see the same performance or if one is performing better than another.

    If the second droplet is better than it sounds like a droplet/server issue and you can open up a support ticket to let the support team know that the hypervisor that the original droplet is on is having some load issues.

    That will help knock out some options there.

    Hopefully after you have this working well on the new droplet, not sure if that is the cause yet or not, then I would go back to see what queries are taking so long to execute because 500ms is still a very long time. That's half a second, which is way too long, so I would continue to troubleshoot there, unless the new droplet is executing this significantly faster than the 500ms. So hard to say if the 500ms locally is just a local dev environment and it's faster in production or not. But otherwise you really want DB queries to be well under 100ms.

    • Just tried the new droplet method, same performance. This means the problem is in my database, or in my code? The code is identical in development/production, so I'm guessing the problem is the database.

      A note on the consistency. It is consistent, but just randomly slow. E.g. an ajax call that takes 1.5 secs will always take about 1.5 secs. A call that takes 20 seconds will always take 20 seconds.

      Here is a link to the page - https://www.bungol.ca/map/ - As you pan the map and start zooming out, things get slow. Interestingly, a medium zoom level is when the 20 second calls start happening. When you zoom out all the way, the calls become 6 seconds.

      And I agreement that even 500ms is too long, I can optimize that by making the code more efficient, which will be done after there aren't anymore 20 second calls in the production environment!

      -

      This may be a clue but I don't know how to analyze it. There is a heavy amount of disk read when someone is using the server. Going upwards to tens of MB's for a single query. That doesn't feel normal. Photo

      • Ok so with the new droplet performance was the same so it looks like it is related to your code / database / disk IO, what you can do is look at your Disk IO rates on your local environment, perhaps the difference in 500ms to 1.5ms is related to Disk IO speeds and not CPU/RAM.

        Additionally, that is a very large amount of disk usage for a single, or multiple single requests, so that is a bit abnormal.

        I would say try running iotop to examine which processes on your system are actually causing that disk usage and we can take it from there to investigate further.

        You can also think about using something like NewRelic that provides application level monitoring, where it can trace through the execution of the code to see where your largest bottlenecks are. But before we get that far we need to find some system level information either from iotop, top, etc, that shows which process is really causing the delay as there maybe an easier fix.

        Also you mentioned that you are running something related to maps, is this a plugin that you are using? Having you checked any documentation for it or if anyone has posted troubleshooting tips for it to improve performance?

        It is also possible that something should be cached and isn't. That would also explain why you are 500ms latency locally.

        • For this round of testing I chose a query that is taking ~25 seconds in production. Link to query.

          I installed iotop, and doing this query alone is driving the disk read I/O as high as 25 mb/s. I did this on a test droplet so this is definitely the only query the database was making at the time. Nothing seems out of place other than the super high read I/O, it was postgres taking up all the %.

          I checked the disk read I/O for the same query in development, and the disk I/O didn't even change. It's probably less than 0.1 MB/s.

          So something in the production environment is eating up a huge amount of disk I/O, even with a singular query. This wasn't the case a week ago. I'm trying to recall what I could have modified in the server that caused this, or perhaps some automatic Ubuntu update is the problem?

          P.S. I'd say the 500 ms latency in development is reasonable, because the code is inefficient (will fix it soon...).

Have another answer? Share your knowledge.