BASH

Extract Log Data with `grep` and `awk`

Efficiently filter and extract specific data from large log files (e.g., web server access logs) using a combination of `grep` for pattern matching and `awk` for column-based extraction.

#!/bin/bash

# Configuration
LOG_FILE="/var/log/nginx/access.log" # Example: /var/log/apache2/access.log
SEARCH_STRING="500"
IP_ADDRESS="192.168.1.1"

echo "Searching for logs containing '$SEARCH_STRING' and from IP '$IP_ADDRESS' in $LOG_FILE"

if [ ! -f "$LOG_FILE" ]; then
    echo "Error: Log file not found at $LOG_FILE" >&2
    exit 1
fi

# Example: Filter Nginx access log for 500 errors from a specific IP
# and extract timestamp, request method, URL, and user agent.
# Nginx log format often looks like: 
# $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"

# Grep for the search string and IP, then pipe to awk for column extraction.
# Awk fields for a common Nginx log: 
# $1=IP, $4=Timestamp (with [), $5=Timestamp (with ]), $6=Method, $7=URL, $8=Protocol, $9=Status

grep "$SEARCH_STRING" "$LOG_FILE" | \
  grep "$IP_ADDRESS" | \
  awk '{
    timestamp = substr($4, 2) " " substr($5, 1, length($5)-1); # Remove brackets
    method = $6;
    url = $7;
    status = $9;
    user_agent = $12; # This might vary based on log format and quoting
    # Reconstruct user_agent if it's multi-word and quoted
    if (NF >= 12) {
        user_agent_start_idx = 12;
        # Check if user agent is quoted (starts with double quote)
        if (substr($12, 1, 1) == "\"") {
            for (i = 12; i <= NF; i++) {
                user_agent = user_agent " " $i;
                if (substr($i, length($i), 1) == "\"") {
                    user_agent_end_idx = i;
                    break;
                }
            }
            # Remove surrounding quotes
            user_agent = substr(user_agent, 2, length(user_agent)-2);
        } else {
             user_agent = $12;
        }
    }
    
    printf "Timestamp: %s | Method: %s | URL: %s | Status: %s | User-Agent: %s
", \
           timestamp, method, url, status, user_agent;
  }'

if [ $? -eq 0 ]; then
    echo "
Log extraction complete."
else
    echo "
Log extraction encountered an issue or no matching entries found."
fi
How it works: This Bash script efficiently filters and extracts specific information from large log files, such as web server access logs. It first uses `grep` to filter log entries based on a specific `SEARCH_STRING` (e.g., an HTTP status code like '500') and an `IP_ADDRESS`. The filtered output is then piped to `awk`, which is used to parse each line into fields, extract desired data points like timestamp, request method, URL, status, and user agent, and format them into a clear, readable output. This is invaluable for debugging and analyzing web application behavior.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs