IoT Cloud Platforms: Foundations and Architectural Patterns
Introduction – Understanding the Challenges
Manageability of Distributed Devices at a Massive Scale
Real-world IoT solutions often have a large fleet of devices deployed across a geographically distributed area. These devices can be heterogenous in nature – supplied by different manufacturers, of varying hardware specifications, and with varying feature-capabilities. Moreover, field-devices often have poor or intermittent connectivity to backbone networks.
This brings-up a critical need to reliably interconnect devices over underlying unreliable networks. The massive scale and heterogeneity also bring unique challenges in the effective manageability of devices. Remote administration and servicing in the field quickly become an expensive and painstaking proposition for the operator.
Moreover, gathering massive amounts of data securely from remote devices to derive real business insights can be cumbersome – making it harder to justify the ROI on your IoT investments. IoT Cloud platforms help address these challenges.
IoT Cloud Platforms
Offering Interconnectivity, Visibility, and Remote Management
Major cloud providers today offer IoT Platforms. These platforms enable interconnectivity across heterogeneous devices to provide real-time visibility of your fleet, remote monitoring, and remote manageability – at scale! This approach also brings elasticity, so you only pay-per-use and avoid fixed and upfront capital expenditure on your computing infrastructure.
In this article we explore the common architectural patterns and foundational concepts in IoT Cloud platforms. We help understand the key considerations in building a mature architecture for your next IoT project and to make a well-informed assessment before choosing a suitable IoT Cloud Provider. Let’s dive right in.
An Inventory of Managed Devices
IoT Cloud platforms offer an asset registry. This is a master inventory of all your deployed devices. The registry is essentially a cloud-database which tracks a unique ID for each device and records additional attributes about it. The asset registry thus becomes a starting point for you to list, search, monitor, and manage your devices. Commonly tracked attributes can include:
· Device ID or serial number
· Client certificate unique to each device
· Device type, model, manufacturer information
· Version of the installed OS and firmware
· List of device capabilities (CPU Type, Available IO Ports etc.)
· Device location
· Date of device onboarding, firmware update
As a solution developer, you could also include additional attributes in the device registry as you see fit for your business requirements. Device information is added to the registry manually by the operator (during solution deployments). This can be done one device at-a-time, or by importing information in bulk. Alternately, devices can self-enroll into the registry when the device is first powered-up in the field. This process is called just in time enrollment.
Establishing Device Identity and Authentication
Using Client Certificates to Securely Identify Devices To The Cloud
How does the cloud platform really know who the request is coming from or whether to trust the request from an IoT device? Could it be an impostor or a malicious source faking that request?
A simple approach could be to have a ‘shared secret’ (password) between your devices and the cloud. The device presents this ‘shared secret’ to the cloud to authenticate and then receives a session token in return. Cryptographically, this is considered quite insecure as the shared secret can be eavesdropped by a man-in-the-middle attack.
A more reliable approach is to rely on public-key cryptography. Every device is allocated a unique client certificate. The device presents this certificate to the cloud to establish its identity. Here is how it works:
Before the device is provisioned in the field:
1. A public and private key-pair is generated which is unique to this device.
2. A Certificate Signing Request (CSR) is generated for each device based on this key-pair.
3. Most IoT Cloud Platforms have a Certificate Authority (CA) which is capable of issuing digital certificates.
4. The CSR is submitted to this CA for signing.
5. Based on this CSR, the CA issues a digital certificate (X.509 Certificate) for that device.
6. The private key and the digital certificate are then installed (burned) on the device. The private key is expected to be kept secure. The public key and the digital certificate can be, of course, publicly known.
7. The Cloud Platform stores the device certificate in the device registry for future reference. Note: The Cloud does not know or store the device’s private key hereafter.
After the device is provisioned in the field and comes online:
1. Device attempts to connect with the cloud platform.
2. The cloud asks the device to present its device certificate. Device sends the device certificate to the cloud and the cloud verifies the validity of the same.
3. Cloud then send a challenge (text) to the device to prove that the device is indeed the bearer of the claimed private key.
4. Device encrypts the challenge (text) using its private key and sends the cipher-text back to the cloud.
5. Cloud decrypts the cipher-text using the public key of the device to ascertain that the device is indeed the owner of the private key.
6. Cloud now completes establishing a secure HTTPS / TLS session with the device.
7. These steps are followed every time a device attempts to connect with the cloud.
Easing Manageability of the Fleet
Managing devices one at a time can be cumbersome. Here is where groups come in. Cloud platforms allow you to group devices based on their identity or specific device properties. For example, you can group all temperature sensors having a specific ARM CPU having a specific firmware version.
You can then issue commands, modify the configuration, or issue a firmware updates to that entire group of devices at once. Such group operations make manageability easy.
Groups can be statically defined (i.e. devices with specific inherent device properties grouped together), or they can be dynamically defined (based on the current ‘state’ or dynamic values of specific variables in the devices).
Pub-Sub Communication Model
A Decoupled Architecture for D2D and D2C Communication
There is a need for a robust Device-to-Device (D2D) and Device-to-Cloud (D2C) communications. The network topology can be quite dynamic, with devices and sensors intermittently connecting and disconnecting from time to time.
Traditional protocols such as HTTP are request-response centric and are designed for point-to-point communications where both sender and recipient are expected to be present at the same time (i.e. temporal coupling). Such protocols are often unsuitable for IoT needs.
Publish-Subscribe offers an alternate and resilient communications model for distributed networks. Here, all devices first connect to an intermediate messaging broker.
Figure 1: An Illustration of the Pub-Sub Communications Model
Certain devices (say, sensors) can publish messages to topics in the broker and certain other devices (say, alarms) can subscribe to those topics. Messages can contain telemetry information and each message specifies one topic it belongs to.
The primary function of the broker is to route all inbound messages destined for a specific topic to all subscribers of that topic. The broker is thus a store-and-forward engine which can ‘fan out’ incoming messages to multiple recipients depending on the interests of those recipients.
The senders and receivers do not directly know about each other’s presence and availability – thus providing a greater degree of temporal de-coupling compared to point-to-point communications.
If a device goes offline (after subscribing to a topic), the broker can also queue messages for that device. The broker will deliver these queued messages to that device when the device reconnects again in the future. This provides a powerful model for asynchronous communications.
All D2D and D2C chatter flows thru the broker and the Pub-Sub mechanism and the messaging broker form the communications backbone of most IoT Platforms today. MQTT is one such protocol specification that adopts the publish-subscribe communication model.
What is a Device Permitted to Do?
Can a device publish messages to any topic? Subscribe to any topic? In case a device’s security is breached, can we limit the collateral damage to the system from that device? This is where device policies and authorization controls come into the picture.
A policy is a set of rules associated with each device (specifically, the policy is associated with the device’s certificate). Policies can whitelist or blacklist what a specific device can really do at runtime. For example:
· Which topics are permitted to subscribe and publish to.
· What operations to perform on the device shadow (see later).
· What operations to perform with ongoing jobs (see later).
Once deployed, the messaging broker and the IoT Platform will enforce these policies at runtime. Messages will be routed (or blocked) in adherence to the policy. The policy can be applied to individual devices or, usually, to a group of devices.
Triggering Actions Based on Events and Conditions
What do we really do with all the inbound telemetry messages? Route them somewhere? Perform specific actions based on what’s inside that message? Process, modify, and resend the modified message to someone else? The rules-engine provides these capabilities.
The rules engine is fed with a set of rules as part of its configuration and all the inbound messages are fed into this rules-engine for processing in near-real-time. Each rule consists of:
· Rule name – Human readable name of the rule.
· Matching criteria – Does the input message match this criterion? If so, trigger the Action.
· Action – Specifies what should be really done if the message matches the criteria.
Figure 2: Rules Engine
A wide range of actions can be performed when a specific rule gets triggered:
· Forward the message to another topic.
· Store the message to a cloud database.
· Write the message to a flat-file or log.
· Invoke application logic (function) and pass along the message as input to that function.
· Add the message to a specific queue for further processing.
· Update the state of a “state machine” in the cloud.
You could of course write an entirely custom logic to process every inbound message (without really needing a rules-engine). However, the rules-engine provides a structured and elegant
framework to handle all your inbound messages. Moreover, these engines are designed to massively scale to handle millions of messages every day.
Real-time Processing Of Data-In-Motion
While the rules-engine is a good way to process messages in real-time, they are often constrained in their expressive capability. An alternate class of engines are available today which are better suited to real-time stream processing.
Here, ‘stream’ refers to the slew of telemetry messages sent to the IoT cloud from thousands of field devices. You specify your processing logic as a series of processing nodes connected together as a Directed Acyclic Graph (DAG).
Figure 3: Stream Processing Workflow
Every message flows-thru the pathways of the DAG and at each step a specific processing logic operates on that message. The nodes in the DAG can perform operations such as:
· Message transformation
· Aggregation and windowing
· Conditional routing of the message within the DAG
· Logging the message
Stream processing thus offers a greater flexibility and customizability compared to a rules-engine. Unlike traditional means of processing data ‘at rest’ (Ex. Databases, Data-warehouses, Filesystems), stream processing is optimized to process data-in-motion.
Supporting Heterogenous Protocols with Protocol Translation
In an IoT solution, devices from different suppliers can use different protocols – MQTT, AMQP, HTTP, XMPP, CoAP etc. Some IoT Cloud Platforms, on the other hand, may have a specific default / native protocol that it supports – say MQTT only.
To interconnect these devices into a singular solution, IoT Cloud Platforms offer a gateway which provides protocol adaptation. The gateway translates messages from one format to another in real-time and providing full bi-directional support, thus addressing the ‘impedance mismatch’ problems between protocols.
Here are some functions that the gateway can perform:
· Message transformations
· Message compression / decompression
· Message encryption / decryption
· Custom authentication
The protocol gateway is often an optional component to be used only if devices are incompatible. These gateways are often extensible, so you can integrate your proprietary or custom protocols onto it as well.
A Virtual Clone of Your Device In The Cloud
Consider the scenarios below:
1) Your application would like to read the state of a specific IoT device – say, to display this information to the user.
2) Your application would like to modify the state of a specific IoT device – say, to control an actuator connected to that device.
The application would like to do so with minimal latency and without having to ‘busy wait’ for the device (as the device connectivity may be intermittent) or without waiting for a slow round-trip to the device. This is where device shadows can help.
A device shadow represents a virtual representation of the device in the cloud. It is an object (stored in the cloud DB) which represents the ‘last reported state’ of the device. The device periodically sends its own state information to keep the device shadow up-to-date. Applications can read from the device shadow instantly, without waiting for a round-trip to the actual device.
When the application wishes to modify the state of a device (say flip a flag to turn-on the light), it can directly modify the ‘desired state’ object in the device shadow. This becomes a fire-and-forget operation for the application’s point of view. When the device connects to the cloud platform the next time, it can read the ‘desired state’ from the device shadow and invoke its actuators to try and reach that desired state.
This offers a convenient decoupling to build your web and mobile applications which only have to interact with the device shadow (without having to deal with the actual device itself). It simplifies the programming model of those applications as they no longer have to deal with device-specific protocols or to busy-wait for device availability. Fire-and-forget!
Thus far, we have explored the foundational concepts and capabilities of IoT Cloud Platforms. The design patterns discussed here help build a robust and secure architecture for your IoT solutions. To recap, the patterns:
· Asset Registry
· Client Certificates
· Device Grouping
· Pub-Sub Communications Model
· Device Policies
· Rules Engine
· Stream Processing
· Device Shadow
Solution integrators today have a lot of Cloud Platform options to choose from. This is a constantly evolving and innovating space. We recommend doing a thorough assessment of the platform capabilities using the parameters discussed here before choosing a specific IoT platform for your next project.