BASH
Generate a Basic `sitemap.xml` for Static Sites
Automatically generate a simple `sitemap.xml` file for static websites by scanning local HTML files, improving SEO and site discoverability.
#!/bin/bash
# Configuration
BASE_URL="https://yourwebsite.com"
SITEMAP_FILE="sitemap.xml"
TARGET_DIR="." # Directory to scan for HTML files (e.g., 'public', 'dist')
echo "Generating sitemap.xml for $BASE_URL from directory $TARGET_DIR..."
# Start sitemap XML structure
echo '<?xml version="1.0" encoding="UTF-8"?>' > "$SITEMAP_FILE"
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' >> "$SITEMAP_FILE"
# Find HTML files and add them to the sitemap
find "$TARGET_DIR" -type f -name "*.html" | while read -r file; do
# Get path relative to TARGET_DIR, remove .html extension
# Example: ./public/about/index.html -> about/index
relative_path_no_ext=$(echo "$file" | sed "s|^$TARGET_DIR/||; s|\.html$||")
# Determine the final URL segment
final_url_segment=""
if [[ "$relative_path_no_ext" == "index" ]]; then
final_url_segment="" # 'index.html' at root or subdirectory means just the directory URL
elif [[ "$relative_path_no_ext" =~ /index$ ]]; then
# if it's 'about/index', it should be 'about'
final_url_segment=$(echo "$relative_path_no_ext" | sed 's|/index$||')
else
final_url_segment="$relative_path_no_ext"
fi
# Construct the full URL
if [[ -n "$final_url_segment" ]]; then
url="${BASE_URL}/${final_url_segment}"
else
url="${BASE_URL}/" # For the root index.html or sub-directory index.html
fi
# Clean up any potential double slashes (e.g., if BASE_URL already ends with /)
url=$(echo "$url" | sed 's|//|/|g')
echo " <url>" >> "$SITEMAP_FILE"
echo " <loc>$url</loc>" >> "$SITEMAP_FILE"
echo " <lastmod>$(date -r "$file" +"%Y-%m-%d")</lastmod>" >> "$SITEMAP_FILE"
echo " <changefreq>weekly</changefreq>" >> "$SITEMAP_FILE"
echo " <priority>0.8</priority>" >> "$SITEMAP_FILE"
echo " </url>" >> "$SITEMAP_FILE"
done
# End sitemap XML structure
echo '</urlset>' >> "$SITEMAP_FILE"
echo "Sitemap generated successfully at $SITEMAP_FILE"
How it works: This script automatically generates a basic `sitemap.xml` file for a static website. It scans a specified directory for `.html` files, constructs their corresponding URLs based on a provided base URL, and then adds them to the sitemap with a `lastmod` date derived from the file's modification time. This is an essential task for SEO, helping search engines efficiently crawl and index website content.