BASH
Extract Log Data with `grep` and `awk`
Efficiently filter and extract specific data from large log files (e.g., web server access logs) using a combination of `grep` for pattern matching and `awk` for column-based extraction.
#!/bin/bash
# Configuration
LOG_FILE="/var/log/nginx/access.log" # Example: /var/log/apache2/access.log
SEARCH_STRING="500"
IP_ADDRESS="192.168.1.1"
echo "Searching for logs containing '$SEARCH_STRING' and from IP '$IP_ADDRESS' in $LOG_FILE"
if [ ! -f "$LOG_FILE" ]; then
echo "Error: Log file not found at $LOG_FILE" >&2
exit 1
fi
# Example: Filter Nginx access log for 500 errors from a specific IP
# and extract timestamp, request method, URL, and user agent.
# Nginx log format often looks like:
# $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"
# Grep for the search string and IP, then pipe to awk for column extraction.
# Awk fields for a common Nginx log:
# $1=IP, $4=Timestamp (with [), $5=Timestamp (with ]), $6=Method, $7=URL, $8=Protocol, $9=Status
grep "$SEARCH_STRING" "$LOG_FILE" | \
grep "$IP_ADDRESS" | \
awk '{
timestamp = substr($4, 2) " " substr($5, 1, length($5)-1); # Remove brackets
method = $6;
url = $7;
status = $9;
user_agent = $12; # This might vary based on log format and quoting
# Reconstruct user_agent if it's multi-word and quoted
if (NF >= 12) {
user_agent_start_idx = 12;
# Check if user agent is quoted (starts with double quote)
if (substr($12, 1, 1) == "\"") {
for (i = 12; i <= NF; i++) {
user_agent = user_agent " " $i;
if (substr($i, length($i), 1) == "\"") {
user_agent_end_idx = i;
break;
}
}
# Remove surrounding quotes
user_agent = substr(user_agent, 2, length(user_agent)-2);
} else {
user_agent = $12;
}
}
printf "Timestamp: %s | Method: %s | URL: %s | Status: %s | User-Agent: %s
", \
timestamp, method, url, status, user_agent;
}'
if [ $? -eq 0 ]; then
echo "
Log extraction complete."
else
echo "
Log extraction encountered an issue or no matching entries found."
fi
How it works: This Bash script efficiently filters and extracts specific information from large log files, such as web server access logs. It first uses `grep` to filter log entries based on a specific `SEARCH_STRING` (e.g., an HTTP status code like '500') and an `IP_ADDRESS`. The filtered output is then piped to `awk`, which is used to parse each line into fields, extract desired data points like timestamp, request method, URL, status, and user agent, and format them into a clear, readable output. This is invaluable for debugging and analyzing web application behavior.