NGINX is still trailing relatively far behind Apache, but there is little doubt that it is gaining more and more popularity — w3tech has NGINX usage at 31%, trailing behind Apache’s 51%. This trend contradicts certain difficulties the NGINX community sometimes laments such as its lack of ease-of-use and quality documentation. For now, it seems NGINX’s low memory usage, concurrency, and high performance are good enough reasons to put those issues aside. Like with any web server, the task of logging NGINX is somewhat of a challenge. NGINX access and error logs can produce thousands of log lines every second — and this data, if monitored properly, can provide you with valuable information not only on what has already transpired but also on what is about to happen. But how do you extract actionable insights from this information? How do you effectively monitor such a large amount of data? This article describe how we at Logz.io overcome this particular challenge by monitoring our NGINX access log with our ELK Stack (Elasticsearch, Logstash and Kibana). NGINX access logs contain a wealth of information including client requests and currently-active client connections that if monitored efficiently, can provide a clear picture of how the web server — and the application that it is serving — is behaving.
By default, NGINX will log information on requests made to the web server to the /logs/access.logfile (error logs are written to the /logs/error.log file). As soon as a request is processed by NGINX, the entry is added to the log file in a predefined format:
109.65.122.142 - - [10/Aug/2016:07:06:59 + 000] "POST /kibana/elasticsearch/_msearch?timeout=30000&ignore_unavailable=true&preference=1447070343481 HTTP/1.1" 200 8352 "https://app.logz.io/kibana/index.html" "Mozilla/5.0 (X11; Linux armv7l) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/45.0.2454.101 Chrome/45.0.2454.101 Safari/537.36" 0.465 0.454
You can, of course, configure the format and the location of the log file (more on this in the NGINX documentation), but a quick look at the sample log line above tells us two things.
First, there is a lot of useful data to analyze such as the request URL, HTTP response code, and client IP address.
Second, analyzing this data together with the data collected from other sources is going to be a huge headache. That’s where ELK comes to the rescue — the stack makes it easy to ship, store, and analyze all the logs being generated by your NGINX web server. It’s NGINX log analysis made easy. Your first task is to establish a log pipeline from your server to the ELK instance of your choice. This is NOT the focus of this article — but here are few quick tips for setting up the pipeline.
First, decide which ELK stack you want to use — either a self-hosted one or a cloud-hosted solution such as Logz.io. This decision can save you hours of work, so do your research well, and try and consider the pros and cons (hint: it’s all depends on the amount of resources at your disposal!).
Decide which log forwarder to use to ship the logs into ELK. There are a number of ways to forward the logs into Elasticsearch, the most common one being Logstash — the stack’s workhorse that is responsible for parsing the logs and forwarding them to an output of your choice. Important! Think carefully how you want to parse your NGINX access logs — parsing makes sure that the access logs are dissected and subsequently indexed properly in Elasticsearch. The more you invest in this, the easier it will be later to analyze the logs in Kibana. Here is an example of a Logstash configuration file for shipping and parsing NGINX access logs into Elasticsearch. In this case, we’re using a wildcard configuration to monitor both the NGINX access and error logs. The filter used here is the one used by us at Logz.io to parse NGINX logs:
input {
file {
type => nginx_web
path => ["/var/log/nginx/*"]
exclude => ["*.gz"]
}
}
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
output {
elasticsearch { embedded => true }
}
There are other log shippers you can use — such as Filebeat or Fluentd — but for the sake of simplicity, you’ll probably want to start with Logstash. (See our comparison of Fluentd versus Logstash.)
Once your NGINX access logs are being shipped into Elasticsearch, it’s just a matter of deciding which data you wish to visualize and monitor in a dashboard. I say “just”, but the truth of the matter is that this task can be extremely time-consuming if you don’t know your way around Kibana or don’t entirely understand how your NGINX access logs are being parsed. For that reason, Logz.io provides a library of ready-made visualizations and dashboards called ELK Apps. Basically, in just (no quotation marks this time!) one click, you can get started with the NGINX monitoring dashboard described below (learn more about ELK Apps here). But for those who are not using Logz.io, the next section will show how we created this dashboard as well as explain the metrics that we thought were important to extract and monitor in NGINX access logs.
Being able to monitor where requests are being made to the server is the most obvious visualization to include in any monitoring dashboard. Knowing from where in the world people are accessing your website is important not only for troubleshooting and operational intelligence but also for other use cases such as business intelligence (as explained in this article) as well. Constructing a geo-access visualization is pretty straightforward if the access logs are being parsed correctly. The configuration for the visualization uses a metric count aggregation (counting the number of requests) and a bucket geohash aggregation of the geoip.location field.
Monitoring the average number of bytes being sent by NGINX is a great way to identify when something in your environment is not performing as expected. There are two extremes that may indicate something is wrong — when the number bytes is either drastically lower or drastically higher than average. In both cases, you will first need to be acquainted with the average value during regular circumstances. The configuration for this line chart visualization includes a Y axis displaying an average value for the bytes field and an X axis using the Date Histogram aggregation type:
Which URLs are visited the most times is probably one of the most important metrics to monitor as it gives you an idea of which request to NGINX occurs more often than not. When something goes wrong, this can be a good place to start because it might indicate which service crashed your entire environment. The configuration for this bar chart visualization includes a Y count axis and an X axis using the Terms aggregation type for the request field (the example is configured to show the top 20 results):
Seeing a breakdown of the HTTP codes being returned for requests made to your NGINX web server is extremely important to get a general overview of the health of your environment. Under regular circumstances, you will expect to see an overwhelming majority of “200” responses. When you begin to see “400” and above responses, you will know something is wrong (ideally, you’d be able to use an alerting mechanism on top of ELK to notify you of such occasions). The configuration for this bar chart visualization is identical except that this time we are using the “response” field for the X axis:
While not crucial for operational intelligence, seeing which client agents are sending requests to NGINX is useful for gathering business intelligence. For example, say you added a new feature only available for a specific newer version of Chrome and want to understand which user segment will not be able to use it — this would be a great way to find out. The configuration for this data table visualization consists of a metric count aggregation, and a bucket configured to aggregate the agent field (this example is also set to display the top 20 results):
Last but not least, I’ve grouped together these four visualizations since their configurations are almost identical. They all provide the top twenty values for metrics on the client sending out the request to the server — the top twenty operating systems, countries, and browsers. The configuration for these pie chart visualizations consists of a metric count aggregation and a bucket configured to aggregate the geoip.country_name field (that is set to display the top 20 results). To build the other visualizations, we simply switched the field to: name (for the top 20 browsers) and os (for the top 20 operating systems): Once you have all of your visualizations ready, it’s just a matter of going to the Dashboard tab and compiling them together. The result is a comprehensive monitoring dashboard that gives you real-time data on how your NGINX server is being accessed:
Analyzing your NGINX access logs provides you with not only good visibility into the health and activity patterns of your environment but also the operational and business intelligence needed to improve and enhance your product. Adding error logs into the picture (by easily shipping the error log into ELK as well) does run the risk of adding another layer to your monitoring but also enables you the option of identifying correlations between the two logs and thus also enables faster troubleshooting. There are many paid solutions out there for monitoring NGINX such as NGINX Plus — a comprehensive offering that includes monitoring out-of-the-box. Naturally, it all depends on the resources you have available. To construct the dashboard above, you “just” need some time to put together some open source technologies.