A Five Minute Overview of AWS Step Functions

I recently had the opportunity to do a little research into AWS Step Functions. What I learned was interesting and sparked some thinking about how I could improve some of my serverless projects by moving to a Step Function. This article discusses what AWS Step Functions are, how they fit into an application architecture, and their benefits.

AWS Step Functions are a “function orchestrator”, allowing you to connect multiple Lambda functions and other AWS services into an application. By moving the logic associated with the application including decisions, retries, parallel tasks, and error handling out of the Lambda functions, we can reduce the amount of code to construct the application, simplifying updates and reducing code complexity. However, before diving into Step Functions, we need to talk about some terminology.

Each component or task should be considered a microservice. A microservice is a small piece of code implementing a single task. It can be updated independently of other microservices barring changes to inputs and outputs. They make heavy use of messaging to control the state, are autonomously developed and decentralized.

However, managing all these microservices also creates a different form of complexity from the monolithic application, that being interdependence. As we make more and smaller functions, we create interdependencies between those functions.

With monolithic systems, complexity arises as the code base grows. There is often interdependence between elements in the code base, and as it grows in size, changes become more complex. Monoliths typically require deploying the entire application as a whole. And, both testing and quality assurance of the large codebase is also more complex.

AWS Step Functions use tasks to perform work, and states to control the logic flow, decisions, retries, error handling, parallel processing, and timeouts. By moving the logic from the task-specific code, those code modules become smaller and easier to maintain.

The microservice and step function approach can be used to change the design and implementation of a monolith, making the code base more manageable and improving the ability to maintain the application.

The final piece is the state machine. In many programs, we handle “state” by creating variables and evaluating those variables for specific content or state. This is most commonly implemented using programming language constructs like if-then-else, switch, and case. A state machine describes the various states the machine can be in at any given time. The focus is on the state and not on the variable. As tasks are executed, the state changes are called transitions.

The “intelligence” behind Step Functions is the state machine. The state machine is a JSON document describing the tasks and states in the workflow. Here is an excerpt illustrating a task and a state.

"SavePair": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT:function:poc-save-db",
"Next": "SavedToDB"
"SavedToDB": {
"Type": "Choice",
"Choices": [
"And": [
"Variable": "$.statusCode",
"NumericEquals": 200
"Variable": "$.body.first",
"IsPresent": true
"Variable": "$.body.second",
"IsPresent": true
"Next": "NotifyPair"
"Variable": "$.statusCode",
"NumericEquals": 400,
"Next": "SaveFail"

The task SavePair executes the Lambda function specified by the Resource keyword, and what the next action is with the Next keyword. When the SavePair task executes, its input is from the previous step, and its output is passed to the Next step.

The SaveToDB state is a choice. It verifies the data provided to the choice consist of the variables in the And section. If those variables are not present in the input, the state fails, and the workflow exits. This means there are some planning and design necessary before actually implementing the JSON state machine template. The state machine uses a prescribed language called the Amazon States Language. The language specification defines the various tasks and states available to use in the state machine design.

As the declarative JSON is written, the workflow is graphically displayed, making it easy to understand how the transitions create the flow and branching to perform the work.

The Step Function Graphical View
The Step Function Graphical View

The AWS Step Function state machine has the following states:

  • Pass — sends the input data to the output, performing no work;
  • Task — causes the work specified by the Resource keyword to be executed;
  • Choice — add branching logic to select different paths based upon the conditions in the choice;
  • Wait — causes a delay in processing for the specified number of seconds or expiration at a specific date and time;
  • Succeed — ends the state machine, or a parallel or map statement;
  • Fail — terminates the workflow and marks the step as “failed”;
  • Parallel — causes the parallel execution of the identified branches; and,
  • Map — independently executes each item in the map, possibly in parallel.

Being able to perform work in the Step Function workflow is accomplished using the available task providers.

Tasks are not restricted to Lambda functions. We can also:

  • Run an AWS Batch job and then perform different actions based on the results;
  • Insert or get an item from Amazon DynamoDB;
  • Run an Amazon Elastic Container Service (Amazon ECS) task and wait for it to complete;
  • Run an Elastic Kubernetes Service (EKS) task;
  • Publish to a topic in Amazon Simple Notification Service (Amazon SNS);
  • Send a message in Amazon Simple Queue Service (Amazon SQS);
  • Manage a job for AWS Glue or Amazon SageMaker;
  • Build workflows for executing Amazon EMR jobs;
  • Launch an AWS Step Functions workflow execution;
  • Run Lambda functions using API Gateway HTTP, HTTPS and REST endpoints; and,
  • Execute Amazon Athena jobs.

Support for the API Gateway endpoints was added in November 2020, so I would expect we will continue to see more and more service integrations.

As I worked through a sample proof of concept with AWS Step Functions, I learned a lot about designing the workflow and describing it in the declarative JSON. Here are some tips to help you as you look at using Step Functions to modernize an existing application or build a new implementation.

Take the time to do one or more of the available tutorials. It will save you time and frustration.

Design the workflow upfront.

I can’t stress this enough. How often do we just sit down and write a piece of code without any design? Often. We all do it. Skipping that step will cause you challenges later. Take the time to draw a flowchart or process map identifying the steps to be done, the decisions/logic between steps, along with the input needed and the output produced. Doing this now will save you significant frustration later.

The work has to be thought of in very small pieces.

When I say small, I mean break the work down into the smallest piece and design a task to perform that work.

For example, let’s consider writing a letter to Mom. You know, write the letter and take it to the mailbox. Right? Well, not quite from the perspective of the Step Function.

  1. Get a piece of note paper.
  2. No note paper.
  3. Do we have more note paper?
  4. No. Run workflow “buy note paper”.
  5. Yes. Open package. go to step 1.
  6. Pick up pen.
  7. Does it work?
  8. No. get another pen. select new pen. go to step 6.
  9. Yes.
  10. Write letter.
  11. Fold paper.
  12. Get envelope.
  13. No envelope. run workflow “buy envelope”.
  14. Yes. insert letter.
  15. Write Mom’s address on envelope.
  16. Write my return address on envelope.
  17. Seal envelope.
  18. Get a stamp.
  19. No stamps.
  20. Do we have more stamps?
  21. No. run workflow “buy stamps”.
  22. Yes. go to step 18.
  23. Apply postage.
  24. Run workflow “mail letter”

This may look like overkill, but the point is from here we can make certain decisions about optimizing the workflow process only after we understand what the process is. For example, after writing this down, we could change the workflow to initially evaluate the inputs at the beginning of the workflow such as:

  • Do we have paper?
  • Do we have a pen?
  • Do we have an envelope?
  • Do we have a stamp?

If the inputs are not available, we should stop the workflow until the inputs are. We can then simplify this process and avoid doing work again and paying for execution time. It makes it easier to identify unnecessary or non-value-add work. (Your Lean Six Sigma colleagues will love Step Functions.) Going through this process also makes it easier to write small functions.

When I started in technology over 30 years ago, a colleague told me “if your function is more than 100 lines including comments, you are doing too much, or making it overly complicated”.

You don’t make the decisions in the Lambda function, put those in the state machine.

Unless you have to implement some form of logic in the Lambda function or task provider to yield the desired result, don’t do it. Otherwise, your function is making the control decisions and giving the state back to the state machine instead of allowing the state machine to control the execution.

Remember to output the data.

The inputs to a task or state are processed and some output is generated. The output is then sent to the next step as input. If function 1 is creating output that is needed by a letter step, make sure any intermediate steps also send that data through. This is because, except for the Pass state, the outputs are not necessarily the same as the input.

This has been a very quick look at AWS Step Functions, along with some hints to make getting into your first Step Function easier. Remember, not every Lambda should be converted to a Step Function. The goal of Step Functions is to control a workflow or process. If you have a large Lambda function with lots of logic branching, maybe it could be a Step Function candidate. I would go back to my list of things to do and draw out the process the Lambda function implements. Then, you can decide to change it or leave it alone.

Amazon States Language Reference

Amazon Step Functions Overview

API Gateway Service Integration

Create a Serverless Workflow Tutorial

EKS Service Integration

Microservices: A Quick and Simple Definition

What is a state machine?

Chris is a highly-skilled Information Technology, AWS Cloud, Training and Security Professional bringing cloud, security, training, and process engineering leadership to simplify and deliver high-quality products. He is the co-author of seven books and author of more than 70 articles and book chapters in technical, management, and information security publications. His extensive technology, information security, and training experience make him a key resource who can help companies through technical challenges. Chris is a member of the AWS Community Builder Program.

This article is Copyright © 2020, Chris Hare.

Written by

Chris is the co-author of seven books and author of more than 70 articles and book chapters in technical, management, and information security publications.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store