Let’s say you have a big JavaScript file, remaining from the old days. It’s 70,000 lines long and you desperately need to split it up using webpack or consorts. Then you need to know what function or constants it exposes to the global scope.
Let a computer read through your code and extract what you want from it.
It’s a job for Abstract Syntax Trees (AST).
The following example is small. Our mission, should you choose to accept it, will be to extract the names of all the functions exposed in the global scope.
// test the code
function decrementAndAdd(a, b) {
function add(c, d) {
return c + d;
}
a--;
b = b - 1;
return add(a,b)
}
// test the code
function incrementAndMultiply(a, b) {
function multiply(c, d) {
return c * d;
}
a++;
b = b + 1;
return multiply(a, b)
}
Result should be ["decrementAndAdd", "incrementAndMultiply"]
.
An AST is the result of parsing code. For JavaScript, an AST is a JavaScript object containing a tree representation of your source. Before we use it, we have to create it. Depending on the code we are parsing, we choose the appropriate parser.
Here since the code is ES5-compatible, we can choose the acorn
parser.
Here are some of the most well known Open Source ECMAScript parsers:
Parser | Supported Languages | GitHub |
---|---|---|
acorn | esnext & JSX (using acorn-jsx) | https://github.com/acornjs/acorn |
esprima | esnext & older | https://github.com/jquery/esprima |
cherow | esnext & older | https://github.com/cherow/cherow |
espree | esnext & older | https://github.com/eslint/espree |
shift | esnext & older | https://github.com/shapesecurity/shift-parser-js |
babel | esnext, JSX & typescript | https://github.com/babel/babel |
TypeScript | esnext & typescript | https://github.com/Microsoft/TypeScript |
All parsers work the same. Give it some code, get an AST
.
const { Parser } = require('acorn')
const ast = Parser.parse(readFileSync(fileName).toString())
The TypeScript parser syntax is a little different. But it is well documented here.
This is the tree obtained with @babel/parser
parsing:
// test the code
function decrementAndAdd(a, b) {
return add(a, b)
}
In order to find what we are going to extract, it’s often better not to treat the whole AST at once. It’ll be a large object with thousands of nodes even for small code snippets. So, before we extract the information we need, we refine our search.
The best way to do that is to only filter the tokens one cares about.
Once again, many tools are available to do this traversing part. For our example, we are going to use recast
. It’s very fast and has the advantage of keeping a version of your code untouched. This way, it can return the part of your code you want with its original formatting.
While traversing, we’ll find all the function
tokens. This is why we use the visitFunctionDeclaration
method.
If we wanted to look at variable assignments we would use visitAssignmentExpression
.
const recast = require('recast');
const { Parser } = require('acorn');
const ast = Parser.parse(readFileSync(fileName).toString());
recast.visit(
ast,
{
visitFunctionDeclaration: (path) => {
// the navigation code here...
// return false to stop at this depth
return false;
}
}
)
Usually, the names of the token types are not obvious. One can use ast-explorer
to look up the types researched. Just paste your code in the left panel, select the parser you are using, and “voilà!”. Browse the parsed code on the right and find what Node Type
you’re looking for.
We don’t always want to look at every level of the tree. Sometimes we want to do a deep search while other times we just want to look at the top layer. Depending on the framework, the syntax differs. Fortunately, it’s usually well documented.
With recast
, if we want to stop searching at the current depth, just return false
when you are done. This is what we did before. If we want to traverse through (go deep), we can use this.traverse(path)
like you’ll see below.
With @babel/traverse
no need to tell babel
where to continue. One only needs to specify where to stop with a return false
statement.
recast.visit(
ast,
{
visitFunctionDeclaration: (path) => {
// deal with the path here...
// run the visit on every child node
this.traverse(path);
}
}
)
We went from a very broad search to a smaller sample. We can now extract the data we need.
The path
object passed to the visitFunctionDeclaration
is a NodePath
. This object represents the connection between a parent and child AST Nodes. This path
on its own is of no use to us because it represents the link between the function declaration and the body of the function.
Using ast-explorer
, we find the contents of the path we are looking for.
The classic thing to do: path.node
. It gets the child Node in the parent-child relationship. If you chose to search functions, the node in path.node
will be of type Function
:
const functionNames = [];
recast.visit(
ast,
{
visitFunctionDeclaration: (path) => {
console.log(path.node.type); // will print "FunctionDeclaration"
functionNames.push(path.node.id.name); // will add the name of the function to the array
// return false to avoid looking inside of the functions body
// we stop our search at this level
return false;
}
}
)
Try wrapping traversing functions in each other to look at subtrees. The code below will return every function that’s exactly on the second level down. It would not recognize a function in a function in a function:
const functionNames = [];
recast.visit(
ast,
{
visitFunctionDeclaration: (path) => {
let newPath = path.get('body');
// subtraversing
recast.visit(
newPath,
{
visitFunctionDeclaration: (path) => {
functionNames.push(path.node.id.name);
return false;
}
}
)
// return false to not look at other functions contained in this function
// leave this role to the sub-traversing
return false;
}
}
)
Mission Accomplished!!
We programmatically found all the function names. We could as easily find the names of the arguments or the exposed variables.
AST Node one object in a tree. Examples: function declaration, variable assignment, object expression
NodePath link between a parent Node and a child Node in a tree
NodeProperty parts of the definition of the node. Depending on the node, one can have just a name or more info
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.