AWS Step Functions Notes

Intro

Step Functions is a diagram based system to manage complex state based functions that utilize various AWS microservices. It is a “serverless orchestration service”. It allows for branching, error handling, parallel processing and service integration.

step functions diagram

📘 AWS Step Functions Documentation

📘 Step Functions Data Flow Simulator

Amazon States Language

A JSON based language that is used to manage the state in Step Functions.

📘 Amazon States Language

It used to be the case that all step functions had to be written by hand with this language, however now there is a GUI called Workflow Studio which allows you to assemble step functions (partially) with a gui and then assembled the majority of the step function language for you.

State Types

Pass

A Pass state "Type": "Pass" passes its input to its output, without performing work. Pass states are useful when constructing and debugging state machines.

Task

A Task state "Type": "Task" represents a single unit of work performed by a state machine.

All work in your state machine is done by tasks. A task performs work by using an activity or an AWS Lambda function, or by passing parameters to the API actions of other services.

The Amazon States Language represents tasks by setting a state’s type to Task and by providing the task with the Amazon Resource Name (ARN) of the activity or Lambda function.

A Task state must set either the End field to true if the state ends the execution, or must provide a state in the Next field that is run when the Task state is complete.

Choice

A Choice state "Type": "Choice" adds branching logic to a state machine.

Wait

A Wait state "Type": "Wait" delays the state machine from continuing for a specified time. You can choose either a relative time, specified in seconds from when the state begins, or an absolute end time, specified as a timestamp.

Succeed

A Succeed state "Type": "Succeed" stops an execution successfully. The Succeed state is a useful target for Choice state branches that don’t do anything but stop the execution.

Because Succeed states are terminal states, they have no Next field, and don’t need an End field, as shown in the following example.

"SuccessState": {
  "Type": "Succeed"
}

Fail

A Fail state "Type": "Fail" stops the execution of the state machine and marks it as a failure, unless it is caught by a Catch block.

Parallel

The Parallel state "Type": "Parallel" can be used to create parallel branches of execution in your state machine.

Map

The Map state "Type": "Map" can be used to run a set of steps for each element of an input array. While the Parallel state executes multiple branches of steps using the same input, a Map state will execute the same steps for multiple entries of an array in the state input.

Input and Output Processing

The output of a state becomes the input into the next state. However, you can restrict states to working on a subset of the input data by using Input and Output Processing.

In the Amazon States Language, these fields filter and control the flow of JSON from state to state:

  • InputPath
  • OutputPath
  • ResultPath
  • Parameters
  • ResultSelector

The following diagram shows how JSON information moves through a task state. InputPath selects which parts of the JSON input to pass to the task of the Task state (for example, an AWS Lambda function). ResultPath then selects what combination of the state input and the task result to pass to the output. OutputPath can filter the JSON output to further limit the information that’s passed to the output.

input and output json data flow diagram

InputPath, Parameters, ResultSelector, ResultPath, and OutputPath each manipulate JSON as it moves through each state in your workflow.

Each can use paths to select portions of the JSON from the input or the result. A path is a string, beginning with $, that identifies nodes within JSON text. Step Functions paths use JsonPath syntax.

📘 Step Functions Data Flow Simulator

Intrinsic Functions

While the Amazon States Language (ASL) is actually very easy learn (literally just mapping items into and out of a JSON object) there are actually a few built in or “intrinsic” functions that are quite useful, or even necessary at times.

ASL Intrinsic Functions

States.Format

This is a simple concatenation function so that you can string together a couple pieces of state into one string without having to create another task step to accomplish this. When would this actually be useful? One example could be constructing an API request URL string. Another could be adding quotes around a number to turn it into a string (as required for DynamoDB oftentimes when not using the Document Client).

For example given that I have a number in my state and I want to use the built in DynamoDB actions to update an item in a table. Without the document client all inputs must be strings, so we could do the following.

{
  "TableName": "profiles_metadata",
  "Key": {
    "profile_id": {
      "N.$": "States.Format('{}',$.my_profile_number)"    }
  },
  "UpdateExpression": "SET #e6b00 = :e6b00",
  "ExpressionAttributeNames": {
    "#e6b00": "historic_download"
  },
  "ExpressionAttributeValues": {
    ":e6b00": {
      "S": "IN_PROGRESS"
    }
  }
}

Stack Overflow: Is there a way to convert numbers to strings in Step Functions?

States.StringToJson

The documentation on this is clear (linked just above), however it’s not clear when this would be useful. Well I can tell you right now, this is a REQUIRED function when you are executing a state machines from inside a state machine. Embedded state machines if you will. And these are actually wildly useful, as you can create a single state machine to accomplish one task, and then re-use that workflow in multiple places without having to recreate all the steps involved every time. The only trick is that when an embedded state machine returns it’s output state to the parent state machine, the output is embedded as a string… (oh my god why). I faffed around for about 6 hours trying to figure out how to fix this before I finally went back and checked the list of intrinsic functions and this solved my problem immediately.

So remember, when you embed a state machine (step function) inside another state machine, the output of the embedded state machine has been stringified and you need to convert it back to JSON with States.StringToJson if you want to combine it with your machine state and pass it on to the next step.

Common Errors

Here are some problems that I have encountered and the solutions to those problems.

Embedded Step Function not returning output

Let us say that we have a step function with two embedded step functions inside it that are called in order. The first retrieves an access token for an API, and the second makes the api call. You can run into an issue where the first state machine does not appear to return an output to the parent function. If this is the case you probably need to check the Wait for task to complete option in the step configuration.

wait for task to complete checkbox

You would think that all of these operations would be happening asynchronously with the way that these diagrams are laid out, but it appears that multiple step function steps in a row will actually run synchronously. This is totally counter-intuitive given the visual layout of these state machines and the fact that there is a special option to run tasks in parallel. But there you go.

Pass States Will Not Transform/Map Data in State

I previously incorrectly stated that pass states COULD NOT transform data. I was wrong. Thank you Joe Vidalis for correcting me.

If you are having trouble getting your pass states to actually transform the data that is coming into them you are probably missing this setting.

Given a simple pass state that receives an array of dogs like so

dogs

and we want to transform that output into a favorite dog.

favorite dog

The key is to use the Transform input with Parameters setting.

transform input setting

Which in plain ASL translates to this:

"Favorite Dog": {
      "Type": "Pass",
      "End": true,
      "Parameters": {        "favorites": {
          "favorite_dog.$": "$.dog1.breed"
        }
      },
      "OutputPath": "$.favorites"
    }

Lastly here is a relevant StackOverflow thread on this issue:

AWS Step Function - Adding dynamic value to Pass state type

States.DataLimitExceeded

It turns out that there is actually a limit to the size of the state payload that is passed from task to task (the state payload being the JSON object that we are passing from task to task with all the data and manipulating with the ASL, just to be clear). That size limit is 256 kilobytes. If you exceed that limit your state machine will fail on execution and you will get the following error.

{
  "error": "States.DataLimitExceeded",
  "cause": "The state/task 'THE TASK THAT RETURNED THE OVERSIZED STATE PAYLOAD' returned a result with a size exceeding the maximum number of bytes service limit."
}

It is actually quite easy to run into this error if you are using the map state type to complete a large number of tasks. What is happening is that once all the actions inside the map state are complete the step function tries to combine the output of every iteration into an array and pass that on to the next step, even if the next step is END.

For example you could have a step function where the last task is a map state that saves something to a database in every iteration, and when that is complete the state function ends. In this scenario all of the items could successfully save to the database (which was the goal of the step function) but then the step function itself fails at the last possible second, even though it successfully completed it’s goal. All because it tried to combine the outputs of the final states from each iteration into one giant state payload, that you weren’t even planning on using.

The simple solution in this instance is to filter the final output of each iteration in the map state so that each iteration is not outputting unnecessary data that will cause your state machine to fail.

transform result

In my case I have the task above running up to 1200 times in one execution and the combined output of all those executions exceeded the payload limit. However if I just transformed the result into a simple message “SUCCESS” I was back down below the payload limit.

Or alternatively you could just instruct the State Machine to discard the result from this step.

discard the output from step

This is extremely useful, and I would say the defacto setting for any step that does not output a useful piece of state.