BASH

Extract Log Data with `grep` and `awk`

Efficiently filter and extract specific data from large log files (e.g., web server access logs) using a combination of `grep` for pattern matching and `awk` for column-based extraction.

#!/bin/bash

# Configuration
LOG_FILE="/var/log/nginx/access.log" # Example: /var/log/apache2/access.log
SEARCH_STRING="500"
IP_ADDRESS="192.168.1.1"

echo "Searching for logs containing '$SEARCH_STRING' and from IP '$IP_ADDRESS' in $LOG_FILE"

if [ ! -f "$LOG_FILE" ]; then
    echo "Error: Log file not found at $LOG_FILE" >&2
    exit 1
fi

# Example: Filter Nginx access log for 500 errors from a specific IP
# and extract timestamp, request method, URL, and user agent.
# Nginx log format often looks like: 
# $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"

# Grep for the search string and IP, then pipe to awk for column extraction.
# Awk fields for a common Nginx log: 
# $1=IP, $4=Timestamp (with [), $5=Timestamp (with ]), $6=Method, $7=URL, $8=Protocol, $9=Status

grep "$SEARCH_STRING" "$LOG_FILE" | \
  grep "$IP_ADDRESS" | \
  awk '{
    timestamp = substr($4, 2) " " substr($5, 1, length($5)-1); # Remove brackets
    method = $6;
    url = $7;
    status = $9;
    user_agent = $12; # This might vary based on log format and quoting
    # Reconstruct user_agent if it's multi-word and quoted
    if (NF >= 12) {
        user_agent_start_idx = 12;
        # Check if user agent is quoted (starts with double quote)
        if (substr($12, 1, 1) == "\"") {
            for (i = 12; i <= NF; i++) {
                user_agent = user_agent " " $i;
                if (substr($i, length($i), 1) == "\"") {
                    user_agent_end_idx = i;
                    break;
                }
            }
            # Remove surrounding quotes
            user_agent = substr(user_agent, 2, length(user_agent)-2);
        } else {
             user_agent = $12;
        }
    }
    
    printf "Timestamp: %s | Method: %s | URL: %s | Status: %s | User-Agent: %s
", \
           timestamp, method, url, status, user_agent;
  }'

if [ $? -eq 0 ]; then
    echo "
Log extraction complete."
else
    echo "
Log extraction encountered an issue or no matching entries found."
fi

How it works: This Bash script efficiently filters and extracts specific information from large log files, such as web server access logs. It first uses `grep` to filter log entries based on a specific `SEARCH_STRING` (e.g., an HTTP status code like '500') and an `IP_ADDRESS`. The filtered output is then piped to `awk`, which is used to parse each line into fields, extract desired data points like timestamp, request method, URL, status, and user agent, and format them into a clear, readable output. This is invaluable for debugging and analyzing web application behavior.

Extract Log Data with `grep` and `awk`

Related BASH Snippets

Automate Project Dependency Installation

Find and Kill Process Running on a Specific Port

Monitor File Changes and Trigger a Command

Need help integrating this into your project?