JAVASCRIPT
Extract Content Between Specific HTML Tags
Use a regular expression to extract the inner text content from specified HTML or XML tags, providing a lightweight method for parsing simple structured data without a full DOM parser.
function extractTagContent(htmlString, tagName) {
const regex = new RegExp(`<${tagName}[^>]*>([\\s\\S]*?)<\\/${tagName}>`, 'gi');
const matches = [];
let match;
while ((match = regex.exec(htmlString)) !== null) {
matches.push(match[1].trim());
}
return matches;
}
How it works: This function dynamically constructs a regular expression to extract all inner content enclosed within a specified HTML tag. It uses the 'g' flag for global matching and 'i' for case-insensitivity, making it robust enough to handle various HTML structures and attributes within the opening tag. This is suitable for simple content extraction, not complex DOM manipulation.