Bret W. Lester

Multithreaded Javascript in The Browser or The Lack Thereof

WebOutLoud version 3.9.3 was released yesterday. For the majority of users, nothing of note was changed but for a minority of power users, something fundamental was fixed.

An email with the subject, “it stopped functioning,” from a French-Canadian lawyer or law student (idk which) was what brought the problem to my attention.

After some emails back and forth I figured out the problem only occurred on a particular web page that had a ton of content. You could scroll for days without reaching the end. Like an entire novel on a single page. Voluminous! You get the idea.

Anyway, without going into detail, this web page would effectively make the app “stop functioning” as the guy in the email said. You click and nothing happens—that kind of thing.

It didn’t take long to figure out that the root of the problem was WebOutLoud’s page parser. Written in JavaScript, the page parser is responsible for extracting text from a web page, leaving behind ads, navigation elements and other undesirable content. Think Safari’s reader mode.

For this particular web page, the page parser was taking on the order of minutes to complete—rendering the app unusable in the meantime with no feedback indicating to the user WTF is happening.

The solution? Put up some UI for the user giving them an option to cancel. Pretty strait-forward in something like Swift where background threads are an option but in browser-based JavaScript where everything that has do with DOM manipulation is on the UI thread, it’s not so easy.

So I came up with a solution which I’ll loosely call “interval execution.” I’m ignorant of any precedents to this idea so apologies to the astute reader. I’d love to know what this is actually called. Basically the idea is to break the long running task up into intervals of a given timespan. After each interval a callback is invoked to determine if the task should continue. The intervals allow any javascript executions that have been queued since the task begun to have a chance to run before the task finishes. So it accomplishes two things (1) it prevents the page from locking up any longer than the provided interval and (2) it provides a hook into canceling the long-running operation.

At its very heart WebOutLoud’s page parser is doing DOM traversal, performing some proprietary logic on the DOM tree in order to isolate the readable content. To demonstrate the “interval execution” idea, here’s a generic DOM traversal function utilizing interval execution.

function traverse(startingNode, workCallback, shouldContinue, intervalInMilliSeconds) { let interval = intervalInMilliSeconds || 250; // default to 1/4 second performTraverse(startingNode, 0); function performTraverse(node, startingChildIndex) { let startTime = window.performance.now(); let currentNode = node; let currentIndex = startingChildIndex; outer: while (true) { for (let i = currentIndex; i < currentNode.childNodes.length; i++) { let childNode = currentNode.childNodes[i]; if (window.performance.now() - startTime > interval) { // if calling code says we should continue, do so after 1 ms giving any queued executions // a chance to run and therefor not freeze the page. if (shouldContinue()) { window.setTimeout(function () { performTraverse(currentNode, i); }, 1); } return; } workCallback(childNode); // perform some work on this node if (childNode.childNodes) { currentNode = childNode; currentIndex = 0; continue outer; } } if (currentNode === startingNode) { break; } currentIndex = indexOfNodeInParent(currentNode) + 1; currentNode = currentNode.parentNode; } } } // returns the index of a node in its parent function indexOfNodeInParent(node) { let children = Array.prototype.slice.call(node.parentNode.childNodes); return children.indexOf(node); }

The first parameter, startingNode is the node whose descendants you want to traverse. The second parameter, workCallback is called for every node descending from startingNode. This is where a significant portion of WebOutLoud's page parser logic would live for example. The third parameter, shouldContinue is called at the end of every interval. If it returns true then traversal will continue where it left off after 1 millisecond and if it returns false, traversal will cease immediately. The last parameter, intervalInMilliSeconds determines how long an interval will be. I'm not sure exactly what the sweet spot is for this parameter but it should definitely be less than a second to avoid the perception of a frozen webpage during very long-running traversals.

And back to the solution to my particular problem. After integrating "interval execution" with WOL's page parser, I was able to present some UI where the user had the option to cancel an especially long parse which is much better than leaving them to wonder why the hell "it stopped functioning." Under the hood, when the user hits the cancel button some javascript code sets a flag in the page (after the end of the next interval), then the shouldContinue function gets wind of it and returns false. Problem solved.

§

Listen to documents and web articles like this one using lifelike text-to-speech. Try WebOutLoud free.

More Posts

RSS