BASH

Generate a Basic `sitemap.xml` for Static Sites

Automatically generate a simple `sitemap.xml` file for static websites by scanning local HTML files, improving SEO and site discoverability.

#!/bin/bash

# Configuration
BASE_URL="https://yourwebsite.com"
SITEMAP_FILE="sitemap.xml"
TARGET_DIR="." # Directory to scan for HTML files (e.g., 'public', 'dist')

echo "Generating sitemap.xml for $BASE_URL from directory $TARGET_DIR..."

# Start sitemap XML structure
echo '<?xml version="1.0" encoding="UTF-8"?>' > "$SITEMAP_FILE"
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' >> "$SITEMAP_FILE"

# Find HTML files and add them to the sitemap
find "$TARGET_DIR" -type f -name "*.html" | while read -r file; do
  # Get path relative to TARGET_DIR, remove .html extension
  # Example: ./public/about/index.html -> about/index
  relative_path_no_ext=$(echo "$file" | sed "s|^$TARGET_DIR/||; s|\.html$||")
  
  # Determine the final URL segment
  final_url_segment=""
  if [[ "$relative_path_no_ext" == "index" ]]; then
    final_url_segment="" # 'index.html' at root or subdirectory means just the directory URL
  elif [[ "$relative_path_no_ext" =~ /index$ ]]; then
    # if it's 'about/index', it should be 'about'
    final_url_segment=$(echo "$relative_path_no_ext" | sed 's|/index$||')
  else
    final_url_segment="$relative_path_no_ext"
  fi

  # Construct the full URL
  if [[ -n "$final_url_segment" ]]; then
    url="${BASE_URL}/${final_url_segment}"
  else
    url="${BASE_URL}/" # For the root index.html or sub-directory index.html
  fi
  
  # Clean up any potential double slashes (e.g., if BASE_URL already ends with /)
  url=$(echo "$url" | sed 's|//|/|g')
  
  echo "  <url>" >> "$SITEMAP_FILE"
  echo "    <loc>$url</loc>" >> "$SITEMAP_FILE"
  echo "    <lastmod>$(date -r "$file" +"%Y-%m-%d")</lastmod>" >> "$SITEMAP_FILE"
  echo "    <changefreq>weekly</changefreq>" >> "$SITEMAP_FILE"
  echo "    <priority>0.8</priority>" >> "$SITEMAP_FILE"
  echo "  </url>" >> "$SITEMAP_FILE"
done

# End sitemap XML structure
echo '</urlset>' >> "$SITEMAP_FILE"

echo "Sitemap generated successfully at $SITEMAP_FILE"
How it works: This script automatically generates a basic `sitemap.xml` file for a static website. It scans a specified directory for `.html` files, constructs their corresponding URLs based on a provided base URL, and then adds them to the sitemap with a `lastmod` date derived from the file's modification time. This is an essential task for SEO, helping search engines efficiently crawl and index website content.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs