nodejs cli

Node.js is a target platform similar to the jvm. Node.js offers a large ecosystem of packages to build backend and frontend programs. With fast startup times, it is becoming a very popular target for scalajs backend programs. "npm" is the package manager for node. npm installs both packages that are consumed as libraries as well as cli programs, like npm. Programs and packages can be installed globally or locally. Local installation helps with repeatable builds and reduces the disk footprint, but global mode is popular for cli programs. nodejs programs are easily deployed to npm as compressed tar files (.tgz) so its fairly easy to share work, no signatures or other setup steps are necessary. Installing a npm distributed program typically involves downloading the .tgz file using npm, npm running the "install" command to download dependencies and placing the cli program into a common area such as ./node_modules/.bin. Most node.js programs do not bundle their dependencies directly into their artifacts. They rely on npm to install the dependencies described in package.json.

Traditional nodejs command line programs are javascript files with `#!/usr/bin/env node` headers that allow them to be easily run in a unix shell environment. The cli programs are often coded directly into the shebang (#!) format but some are not. The programs `require(...)` additional nodejs or developer js files as needed. Some nodejs programs are created from build systems such as webpack.

nodejs cli apps compete with python cli, unix shells, native executables as well as powershell programs (on windows and linux targets). If the libraries you want to use are in js and you use js for other thing such as client or server applications, then creating nodejs cli applications is a natural extension. Moving "off-platform", in this case away from javascript based tooling, can increase costs and time to market. In fact we will see that there is some friction with using scala's sbt tooling when developing nodejs targeted cli applications, but we need sbt for scalajs compilation.

You can also create platform specific executables using facebook's new pkg which is described here.

Anatomy of a nodejs CLI program

Nodejs programs come in all shapes and sizes with regard to their deployment structure. Generally, dependencies from outside the application are resolved via npm using the standard node_modules search strategy this well documented in the node documentation.

The deployment model for the actual CLI program falls into a couple of different models. All of these models require a bin entry in the package.json so that npm knows which .js file is the "main" entry point.

  • A single index.js file in the toplevel directory. This is also the standard .js file that is default loaded when doing a require('mod') in nodejs code.

  • A single cli.js file. In this case, the file is not called index.js. npm would create a symlink from node_modules/.bin/cli to node_modules/mod/cli.js.

  • A collection of .js files in the toplevel module directory along with the main file cli.js. The "requires" in the cli.js file use require('./dependencya') to bring the contents in. cli.js and dependencya.js are in the same toplevel directory.

  • A singe file or collection of files in a bin directory at the top level of the module. For example you may have node_modules/mod/bin/cli.js and node_modules/mod/bin/dependencya.js. cli.js may also use require('../lib/dependencya') to bring the dependency in from a "lib" dir which is not in the same directory as the main .js file.

Import is processed relative to the requesting file when a relative path (like ./ or ../lib) is used in the import. Import processing uses the standard nodejs search algorithm otherwise (that involves node_modules).

jvm based build tools typically output to a target directory or perform final assembly into a target/web/public for static assets and final artifacts into target/web/stage. The idea is that by using a singe output directory for the build, you can easily cleanup the build location using sbt clean. Deployment often involves jar'ing or ziping up a tree of resources that have only the output artifacts. In addition, since jvm artifacts are more binary, it is rarely a requirement to keep the artifacts in the source repository since they are not easily accessible. Of course, this is in stark contrast to nodejs gitub projects that can be cloned, use "npm install" to install dependencies and then start using the .js program.

When most nodejs CLI programs are installed by npm, they can be rebuilt using the "installed" directory (under node_modules) and be instantly usable. This helps in the scenario where you grab your dependency module from github and place it into the node_modules directory of another project manually. In this case, the module must be usable at the point of download.

So its fairly clear that we need to persist output artifacts in our output tree and source repository so they have the same usability "look and feel" as other nodejs programs.

If we were to output CLI artifacts only to an unique tree, such as under target, we would need to persist target in the tarball that is uploaded to npm so that upon install, the package is instantly usable without rebuild. npm installs can run post-install scripts that could rebuild the project (and in some cases recompile 'C' code). However, it is highly doubtful that users of a scalajs CLI program would have a scala development environment on hand.

npm deployments are tarballs (.tgz) of content. It is not uncommon for the tarballs to have the entire source tree and final artifact tree present since many of the final artifacts are just nodejs friendly .js files to begin with. nodejs CLI programs that need transpilation of some sort often heavily employ the .npmignore file to have the tarball deployment process ignore build artifacts and other unnecessary files.

Based on the common way that .js files are deployed and because we need a transpilation step, among other things, it is probably best to create a "target" directory structure that can be easily tarballed up while still being instantly usable if you were to clone the project from github and run npm install in the toplevel directory.

scalajs-nodejs CLI programs

All of the above commentary means we have a bit of a problem. nodejs import statements process relative to the "requesting" module. If we produce an output artifact using sbt-scalajs at "target/scala-2.12/app-opt.js" and our other .js dependencies are stuck in a standard location like "src/main/resources/..." then running the app-opt.js as the main entry point will cause module resolution problems in the most general case of a CLI program.

  • We could copy the .js dependencies to the target directory but then we have a very messy target directory that we would need to "slice" out when creating the tarball. It also feels highly duplicative.

  • Similar to sbt-web, we could create an output tree, say in a toplevel "cli" directory that builds the artifacts into a tree where resolution will work correctly. The CLI directory could persist in git and be linked through the "bin" entry in package.json.

  • We could "bundle" our application .js files into the sbt-scalajs output artifact and move the output artifact to the toplevel directory. In that case, the the output artifact has no other module dependencies within the project, but may still have nodejs module dependencies.

  • We could bundle everything, application .js dependencies and nodejs dependencies, into a single output file that is copied to the toplevel directory (or wherever). This option will not work in general because some nodejs modules depend on locally available files that are accessed via the nodejs fs module vs a "require/import" mechanism.

Overall, its safest to assume that npm modules should not be bundled into our artifacts and normal npm import resolution methods should be employed. Our only real decision is where should we place our .js dependencies (or really any set of dependencies) such that we can perform testing incrementally, create a clean "production" output tree when pushing to the npm repository and finally persist the artifacts so that cloning from github and running "npm install" makes the CLI program usable.

Since scalajs compilation is a very time consuming process, we need an approach that works for both production and development. Since our scalajs output needs to both consume modules as well as potentially provide serve itself up as a module (via exports from scala), we need a tightly integrated output tree. We could use $NODE_PATH similar to the jvm classpath to "create" a clever way to stitch together module resolution, but it may just be easier to create a tree that is very, very close to what we need to start with.

Since we need to cover multiple deployment scenarios (npm install via the npm respository, github cloning and npm global install), let assume that we have:

  • src: Holds our .scala files and any other files that need processing by sbt or webpack.

  • bin directory: Holds our production entry points. For scalajs programs the entry point is usually the output of scalajs. You could also write a launcher file in pure .js that calls the entry point in scalajs.

  • lib directory: Holds static module resources that do not need any processing. Pure .js files can go here. When requiring them from the main scalajs module, you would want to use @JSImport("../lib/.../dependencya").

Last updated