JAVASCRIPT
Remove HTML Tags from User Input
Learn to sanitize user-generated content by effectively stripping out all HTML tags using a regular expression in JavaScript to prevent XSS vulnerabilities.
const stripHtmlTags = (htmlString) => {
// Regex to match any HTML tag: <tag> or </tag>
// /<[^>]*>/g - matches '<' followed by any character that is not '>' zero or more times, then '>'
// g flag ensures all occurrences are replaced
const htmlTagRegex = /<[^>]*>/g;
return htmlString.replace(htmlTagRegex, '');
};
const dirtyHtml = "<h1>Hello, <b>world</b>!</h1><p>This is a test with <script>alert('XSS');</script> and more.</p>";
const cleanText = stripHtmlTags(dirtyHtml);
console.log(cleanText); // "Hello, world!This is a test with and more."
How it works: The `stripHtmlTags` function takes an HTML string and removes all HTML tags from it using a regular expression. The pattern `/<[^>]*>/g` identifies HTML tags: `<` matches the opening angle bracket, `[^>]*` matches any character that is not a closing angle bracket (`>`) zero or more times, and `>` matches the closing angle bracket. The `g` flag ensures that *all* occurrences of HTML tags are found and replaced with an empty string, effectively stripping them from the input, which is a common step in sanitizing user-generated content.