JAVASCRIPT
Extract All Words from a String
Learn to extract an array of words from any text string, effectively ignoring numbers, punctuation, and extra whitespace, using JavaScript regex.
function extractWords(text) {
// Matches sequences of Unicode letters. \b ensures whole words.
const wordRegex = /\b[A-Za-zÀ-ÿ]+\b/g; // Added À-ÿ for common international characters
const matches = text.match(wordRegex);
return matches || []; // Return an empty array if no matches
}
// Examples
const sentence = "Hello world! How are you in 2023? I'm fine.";
console.log(extractWords(sentence)); // [ 'Hello', 'world', 'How', 'are', 'you', 'I'm', 'fine' ]
const unicodeText = "Français español übermensch cafés";
console.log(extractWords(unicodeText)); // [ 'Français', 'español', 'übermensch', 'cafés' ]
const noWords = "123 !@# $ %";
console.log(extractWords(noWords)); // []
How it works: The `extractWords` JavaScript function uses the regular expression `/\b[A-Za-zÀ-ÿ]+\b/g` to find and extract all "words" from a given string. The `\b` (word boundary) ensures only whole words are matched, and `[A-Za-zÀ-ÿ]+` matches one or more letter characters, including common accented characters for broader international support. The `g` flag ensures all matches are found, returning an array of extracted words. If no words are found, it returns an empty array.