BASH

Extract Unique Domain Names from a List of URLs

A powerful bash one-liner to parse a file containing URLs and extract only the unique domain names. Ideal for analyzing web server access logs, link lists, or creating whitelists/blacklists.

#!/bin/bash

URL_FILE="urls.txt" # Your file containing one URL per line

if [ ! -f "$URL_FILE" ]; then
  echo "Error: URL file '$URL_FILE' not found." >&2
  exit 1
fi

echo "Extracting unique domains from '$URL_FILE':"

# Use sed, cut, and sort/uniq to extract unique domains
cat "$URL_FILE" | \
  sed -E 's/^[[:alpha:]]+:\/\/([[:alnum:]_.-]+).*/\1/' | \
  sed -E 's/^www\.//' | \
  sort -u

# Explanation of sed regex:
# ^[[:alpha:]]+:\/\/       - Matches 'http://', 'https://', etc.
# ([[:alnum:]_.-]+)        - Captures the domain name (alphanumeric, underscore, dot, hyphen)
# .*                      - Matches the rest of the URL path/query

How it works: This script processes a file of URLs to extract unique domain names. It first `cat`s the file content. The first `sed` command uses a regular expression to strip the protocol (http/https) and any path/query parameters, leaving just the hostname. The second `sed` removes 'www.' prefixes for cleaner output. Finally, `sort -u` sorts the resulting hostnames alphabetically and removes duplicates, providing a list of unique domain names. This is very useful for web analysis tasks.

Extract Unique Domain Names from a List of URLs

Related BASH Snippets

Monitor Web Server Error Logs in Real-time

Set Up New Web Project Directory with Correct Permissions

Check Server Disk Space and Alert if Low

Need help integrating this into your project?