When you make a deployment, by running now with our CLI, making an API call or with our GitHub integration, we synchronize the source code and optionally build them to your specification.

The outputs of this build process can be static files or serverless lambdas. Static files will be served directly from our edge caches, while lambdas contain code that gets executed dynamically and on-demand.

Lambdas allow your code to be executed on-demand and scale automatically. They minimize cost in terms of infrastructure spend and engineering time, due to largely removing operational overhead and reducing the surface of error.

Now automatically provisions lambdas with the underlying cloud infrastructure provider for:

  • Dynamic code execution (such as deployments that expose APIs)
  • Builds (e.g., to run static site generators or compile code in the cloud)


If you have these files in your deployment's directory:

  • test.html
  • contact.php
  • date.php
  • api/index.js and api/package.json

We will want test.html to be served statically from our CDN, but the rest to invoke code dynamically as they are requested.

The PHP files can contain any valid PHP code. api/index.js needs just to export a function:

module.exports = (req, res) => {
  res.end('Welcome to my API')

We can then specify the following builds in now.json:

  "version": 2,
  "builds": [
    { "src": "test.html", "use": "@now/static" },
    { "src": "*.php", "use": "@now/php" },
    { "src": "api/index.js", "use": "@now/node" }

When we run now, our output looks like this:

A deployment yielding a static file and Now Lambdas.

The build processes, as we can see, happen in parallel and all yield independent outputs. Our deployment then serves those.

Since we have no index files, when we go to our deployment URL we will get a directory listing:

A deployment yielding a static file and Now Lambdas.

As we can see, our built lambdas (represented by the icon λ) show up as "files" alongside our static ones. When a request is received by them, the code package, deployed to the specified regions, is executed and its response returned.

It is helpful to think of the generated lambdas as "lightweight binaries" that get executed (or typically called invoked) on demand.

With Now, applications that are typically built as servers and containers can seamlessly be compiled into serverless functions under the hood without any code changes, thanks to our builders.

Conceptual Model

Lambdas as Threads

Cloud infrastructure providers typically give us access to several layers of abstraction that map to the traditional counterparts:

  • Set of Computers → Cluster
  • Computer → VM Instance (e.g., EC2)
  • Process → Container
  • Thread → Function

The Now Platform focuses on the last one: the serverless function. If we think about it in terms of primitives, it closely mirrors a thread of computation.

We have typically been able to provision entire computers or orchestrate processes, but a faster, more granular and more parallelizable primitive like the thread has only recently become available.

Versus Processes and Containers

Lambdas have a lot of benefits over booting up and managing containers or processes, let alone computers or clusters.

One of the main benefits is that the developer only needs to reason about one concurrent computation happening in the context of their code, despite many of them being able to run in parallel.

In the example above, api/index.js handles an incoming request, runs some computation, and responds. If another request comes to the same path, the system will automatically spawn a new isolated function, thus scaling automatically.

In contrast, processes and containers tend to expose an entire server as their entrypoint. That server would then define code for handling many types of requests.

The code you specify for handling those requests would share one common context and state. Scaling becomes harder because it is hard to decide how many concurrent requests the process can handle.

Furthermore, if a systemic error occurs when handling an incoming request or event, the entire process or container would be affected, and with it many other requests being handled inside it.

Per-Entrypoint Granularity

Now makes it trivial to build one serverless function per potential entrypoint of your application. If you are familiar with frontend development, you can think of this as code-splitting for the backend.

In the example above, the three dynamic entrypoints we specified (two PHP scripts and the Node.js function) result in three discrete packaged functions that get uploaded and deployed with the cloud infrastructure provider.

This has numerous benefits:

Lifecycle and Scalability

In order to scale, it is widely accepted and understood that more copies of your software need to spawn and the incoming load of traffic balanced between them. This is known as horizontal scalability.

With traditional process-based architectures, scaling like this is possible but has significant drawbacks and challenges:

  • Auto-scaling algorithms are based on high-level metrics like CPU or Memory usage. They might scale too eagerly or not quickly enough, therefore not serving some portion of the traffic in an acceptable amount of time
  • The developer loses the ability to understand and predict exactly how their software will scale since the metrics as mentioned above are very dynamic
  • The developer might have to spend a lot of time and resources carefully running simulations, setting up metric dashboards, ensuring the algorithms are fine-tuned

Typically, when you spawn a server, worker, container, or process, the requests end up all combined into one process, spawned across several cores and computers

A typical request model.

With the Now Lambda architecture, the mental model is much simpler. For each incoming request, a new invocation happens.

The Now Lambda request model.

If a request happens quickly thereafter, the lambda is re-used for a subsequent invocation. After a while, only enough lambdas will stay around to meet the needs of incoming traffic.

If no more traffic comes, lambdas (unlike nearly all alternative systems) will scale to zero. This means that you only pay precisely for the level of scale that your traffic demands.

Cold and Hot Boots

When a lambda boots up from scratch, that is known as a cold boot. When it is re-used, we can say the lambda was warm.

Re-using a lambda means the underlying container that hosts it does not get discarded. State gets preserved (such as temporary files, memory caches, sub-processes). This empowers the developer not just to minimize the time spent in the booting process, but to also take advantage of caching data (in memory or /tmp filesystem) and memoizing expensive computations.

It is important to note that lambdas, even while the underlying container is hot, cannot leave tasks running. If a sub-process is running by the time the response is returned, the entire container is frozen. When a new invocation happens, if the container is re-used, it is unfrozen, which allows sub-processes to continue running.

To minimize the latency difference between a cold and boot instantiation, Now aids by:

  • Designing a build system that has a per-entrypoint granularity. This means that when a user hits a particular endpoint, only the code associated to handle that request has to be fetched and booted.
  • Imposing a limit on max bundle size for each lambda generated by your deployment's builds. Cloud providers allow for a variety of different sized functions, but we have picked one that is aligned with making cold fast instantiation nearly instant for user-facing workloads (such as serving HTTP traffic).

Capabilities and Limits

CPU / Memory

Now configures lambdas to have 3008mb of available memory for builds and output lambdas. The amount of CPU allocated is proportional to the max memory limit.

To understand how the CPU allocation will work for your application, we recommend testing and benchmarking.

Maximum Request Body Size

Each request body has a maximum payload limit of 5mb.

Maximum Bundle Size

The maximum lambda size is 50mb. If a deployment build tries to create a larger lambda as an output, the build will fail and therefore so will your deployment (see the deployment lifecycle documentation for details).

Maximum Execution Duration

A lambda can execute for at most 5 minutes. This is not configurable as of now. If the limit is reached for a particular invocation, it will result in an error 504.

In-memory Caches

It is possible to take advantage of global state from invocation to invocation if the underlying container is re-used.


The filesystem is writable inside /tmp and persisted if the underlying container is re-used. The /tmp directory is writable for up to 512mb of storage.


All output streams are captured and persisted. Each function can store up to 1,000 lines. Each deployment will aggregate up to 1,000 lines. Each line is defined as a string chunk of up to 1kb.