I recently wrote some code to change post titles into slugs (permalinks). Blogs like WordPress clamp down quite hard on the symbols allowed in a post slug, e.g. swapping spaces for dashes and stripping most ASCII non-alphanumeric characters. Most of these transformations can be tackled with some simple regular expressions (and one toLowerCase), but removing diacritics from letters requires a bit more work.
Luckily I found a nice function to help me (updated here). The function contains a big array of basic glyphs (and digraphs, trigraphs, etc.) and regexes of the letter with diacritical marks / Unicode di- or trigraph, e.g.
{ "base": "b", "letters": /[\u0062\u24D1\uFF42\u1E03\u1E05\u1E07\u0180\u0183\u0253]/g },
I needed this array on the server (for storing stuff in the database) and on the client (to give snappy performance), but, as it’s quite large, I wanted to wait until the page had loaded before I requested it, which meant using JSON.
Sending a regex via JSON
Unfortunately stringifying a RegExp gives {}, which is not quite what I had in mind.
If an object has a toJSON function then stringify will use that instead of doing its default conversion, so I added a function that loops through the array and, if it comes across a regex, converts it to a string.
(I’ve assumed the array to send is called bigArray and that bigArrayJSON is cached outside this function.)
bigArray.toJSON = function () {
var entry;
if (bigArrayJSON.length !== 0) return bigArrayJSON;
for (var i = 0, len = bigArray.length; i < len; i++) {
entry = {};
Object.keys(bigArray[i]).forEach(function (key) {
entry[key] = bigArray[i][key] instanceof RegExp ? bigArray[i][key].toString() : bigArray[i][key];
});
bigArrayJSON.push(entry);
}
return bigArrayJSON;
};
Converting a regex string back into a real regex
Now all I needed to do was convert the regex strings back into regexes, and it turned out I could do this with a one-liner.
The regex strings sent from the server will have the form /pattern/flags, which can be split up using /^\/(.*)\/(.*)/ into [pattern, flags] (with a quick slice to restrict it to the parenthesised matches only) – almost perfect for making a new regex. Unfortunately the RegExp constructor expects two parameters, not an array, but apply can handle this.
function saveBigArray(data) {
bigArray = data.bigArray;
for (var i = 0, len = bigArray.length; i < len; i++) {
// Convert string regex to real regex. Use apply as exec returns array.
bigArray[i].letters = RegExp.apply(undefined, /^\/(.*)\/(.*)/.exec(bigArray[i].letters).slice(1));
}
}
I'm a bit uncomfortable with the lack of new when calling RegExp. Testing suggests that it works just fine – RegExp("aa?b", "g") returns an identical new regex as new RegExp("aa?b", "g") – but I'm not happy with this. Any ideas on how to do a new with apply?
(It's possible to use eval on the string regex too, but I didn't want to do that. :)