YAML in Embedded and Electronics Engineering: From Build Manifests to Home Automation

YAML is a human-readable data serialization language built for hand-edited configuration and structured data. This post covers its origins, how it compares syntactically to JSON, XML, and INI for the same configuration, how the three structured formats handle data typing, and the concrete places — from Zephyr build manifests to Home Assistant automation files — where embedded and electronics engineers are required to work with it.

Introduction

Configuration has migrated out of source code. Build systems, continuous integration pipelines, hardware description schemas, and device provisioning increasingly externalize their settings into declarative text files that tooling reads and engineers edit. YAML (a recursive acronym for "YAML Ain't Markup Language") has become one of the dominant formats for that role. For an embedded developer this is no longer a peripheral software-team concern: a Zephyr RTOS project — where RTOS denotes a real-time operating system with deterministic scheduling — is configured through YAML manifests and YAML hardware-binding schemas before any C is compiled, and a home-automation deployment built on Home Assistant is described almost entirely in YAML. The format's apparent triviality is deceptive. Its whitespace sensitivity and implicit type-conversion rules produce a class of defects that surface at build time or, less visibly, at runtime. Understanding why YAML exists and where its edges lie is therefore a practical matter.

What YAML Actually Is

YAML is a data serialization language: a textual representation of structured data built from three primitives — scalars (single values such as strings, numbers, booleans), sequences (ordered lists), and mappings (unordered key–value pairs). It is commonly used for configuration files and in applications where data is stored or transmitted. Structure is expressed through indentation rather than brackets or tags, which is the source of both its readability and its most common failure mode.

A representative fragment:

# Firmware build profile (mapping at the top level)
target:
  board: nrf52840dk          # scalar string
  optimization: -Os          # compiler flag
  features:                  # a sequence of scalars
    - bluetooth
    - usb_cdc
  clock_hz: 64000000         # scalar integer

Notably, a major design objective of the current revision was to make YAML 1.2 a valid superset of JSON — every syntactically valid JSON document is also valid YAML, so YAML can be read as "JSON with optional human-friendly sugar." The current specification is YAML 1.2, revision 1.2.2, published in October 2021; modern libraries generally target YAML 1.2, though several in widespread use retain 1.1 typing behavior, which matters for reasons covered below.

Why It Appeared

YAML's initial release was in May 2001. It emerged in reaction to the verbosity of XML, which dominated structured-data interchange at the time. XML's closing-tag redundancy and attribute/element ambiguity made hand-editing tedious for configuration work. YAML's designers optimized for the opposite priority: a format that maps cleanly onto the native data structures of dynamic programming languages and that a person can read and modify without tooling. JSON arrived in the same general era but prioritized machine generation over hand-editing — no comments, mandatory quoting, no anchors for reuse.

The Same Configuration in Four Formats

The clearest way to see the trade-offs is to express one configuration — a device record with key–value pairs, a nested mapping, and a list — in each format.

YAML:

device:
  name: sensor-hub      # string
  enabled: true         # boolean
  port: 1883            # integer
  sample_rate: 0.5      # float
  channels:             # list of scalars
    - temperature
    - humidity
  mqtt:                 # nested key-value structure
    host: 192.168.1.10
    qos: 1

JSON — quotes and braces are mandatory; comments are not permitted:

{
  "device": {
    "name": "sensor-hub",
    "enabled": true,
    "port": 1883,
    "sample_rate": 0.5,
    "channels": ["temperature", "humidity"],
    "mqtt": { "host": "192.168.1.10", "qos": 1 }
  }
}

XML — structure is explicit but verbose, and a list requires an invented element name:

<device>
  <name>sensor-hub</name>
  <enabled>true</enabled>
  <port>1883</port>
  <sampleRate>0.5</sampleRate>
  <channels>
    <channel>temperature</channel>
    <channel>humidity</channel>
  </channels>
  <mqtt>
    <host>192.168.1.10</host>
    <qos>1</qos>
  </mqtt>
</device>

INI — flat by nature, so nesting is faked with a sub-section header and a list becomes a delimited string the application must split itself:

[device]
name = sensor-hub
enabled = true
port = 1883
sample_rate = 0.5
channels = temperature, humidity   ; no native list; comma-joined by convention

[device.mqtt]                        ; nesting is a naming convention, not structural
host = 192.168.1.10
qos = 1

The progression is instructive: INI cannot natively express either nesting or lists; XML expresses both at a high character cost; JSON is compact but unforgiving for hand-editing; YAML is the most economical for a human to read and modify.

How the Structured Formats Handle Data Types

The deeper difference is not syntax but how each format decides what type a value has — the point at which silent defects originate.

Aspect YAML JSON XML
How type is determined Implicit resolution from scalar form, with optional explicit tags Determined syntactically and unambiguously No native types; all content is text
String vs. number Bare scalar may coerce; quoting forces a string Quotes mean string, bare means number Both are text; a schema or the consumer decides
Boolean true/false (and, under 1.1 rules, yes/no/on/off) true/false only Text such as true, typed only via xs:boolean
Null null, ~, or empty null Empty element or xsi:nil
Explicit override Tags such as !!str 42 or !!int None available XSD type declarations
Ambiguity risk High — implicit coercion Low Low at content level; typing deferred to schema

In JSON, type follows directly from syntax: a quoted token is a string, a bare token is a number, and there is no path by which "NO" becomes anything but the two-character string. In XML, every element and attribute value is fundamentally text; typing is external, supplied by an XML Schema (XSD) declaration such as xs:integer or imposed by the consuming application. YAML occupies the awkward middle: it infers type from the scalar's appearance, which is convenient when editing by hand but introduces the format's signature trap. This implicit resolution is why explicit quoting is the standard defense when a value's type is load-bearing:

country_code: "NO"      # quoted -> string, not boolean false (the "Norway problem")
firmware_rev: "1.10"    # quoted -> preserves the trailing zero, not float 1.1
mac_suffix: "22:22"     # quoted -> avoids base-60 (sexagesimal) coercion under 1.1 rules

YAML 1.2 narrowed the boolean set, but because several common parsers still apply 1.1 semantics, the defensive quoting habit remains warranted regardless of the declared version.

Where Embedded and Electronics Engineers Encounter It

YAML has moved well inside the embedded and device toolchain.

  • Zephyr RTOS build configuration. The west meta-tool manages dependencies through a YAML manifest, west.yml, with each external project pinned to a specific revision for reproducible builds. More relevant to hardware work: Zephyr devicetree bindings are YAML files in a custom format, and Zephyr does not use the dt-schema tools used by the Linux kernel. A binding is the YAML schema that tells the build system how to interpret a devicetree node — where the devicetree description itself (.dts/.overlay) uses a separate, non-YAML syntax — and which driver to associate with it.
  • Device configuration via ESPHome. Frameworks such as ESPHome express an entire device's behavior — pins, sensors, networking, automations — declaratively in YAML, typically with no hand-written firmware.
  • Home Assistant. The open-source home-automation platform centers its configuration on YAML: the main configuration.yaml, plus automations, scripts, scenes, and template sensors. It extends standard YAML with custom tags — for example, !include to split configuration across files and !secret to pull credentials from a separate secrets.yaml — and embeds the Jinja2 templating language inside YAML strings for dynamic values. Reusable automation templates ("blueprints") are distributed as YAML. The practical consequence for an electronics engineer is a single declarative language spanning both ends of a project: a custom ESP32-based sensor is described in YAML at the device level and integrated into Home Assistant through YAML at the platform level.
  • Firmware CI/CD and tooling. Pipelines on hosted CI systems are defined in YAML, and several formatters and linters (for instance, .clang-format) read YAML, placing the format directly in the path of routine C and C++ work.

A useful counter-example for accuracy: not every configuration file is YAML. PlatformIO uses an INI-format platformio.ini; KiCad uses S-expressions. Identifying the format before editing avoids a category of confusing parse errors.

Conclusion

YAML is a deliberately human-oriented serialization language whose strengths and weaknesses both follow from that single priority. It is the appropriate choice when a file is primarily edited and reviewed by people — build manifests, hardware-binding schemas, CI definitions, and declarative device or home-automation configuration, precisely the artifacts now sitting at the entry point of embedded projects. Its comment support, reuse via anchors, and clean nesting are genuine advantages over JSON for that purpose.

It is the wrong choice for high-frequency machine-to-machine serialization, for performance- or memory-constrained parsing on the device itself, and for any path that ingests untrusted input without a safe loader and schema validation; there, JSON's strict and simple parsing — or a constrained format such as INI — is the sounder selection. The differences in type handling are the decisive factor in practice: JSON removes ambiguity by construction, XML defers typing to an explicit schema, and YAML infers it from appearance. The practical takeaway is therefore narrow but consequential — treat YAML's simplicity as a surface property, quote any scalar whose type matters, confirm the parser's version semantics, and validate against a schema wherever a machine, rather than a person, is the consumer.

References / Further Reading

  1. O. Ben-Kiki, C. Evans, and I. döt Net, YAML Ain't Markup Language (YAML™) Version 1.2, Revision 1.2.2. The YAML Project, Oct. 2021. [Online]. Available: https://yaml.org/spec/1.2.2/
  2. T. Bray, Ed., The JavaScript Object Notation (JSON) Data Interchange Format, IETF RFC 8259, Dec. 2017. [Online]. Available: https://www.rfc-editor.org/info/rfc8259
  3. Zephyr Project, "Devicetree bindings," Zephyr Project Documentation. [Online]. Available: https://docs.zephyrproject.org/latest/build/dts/bindings.html
  4. Home Assistant, "YAML — Configuration," Home Assistant Documentation. [Online]. Available: https://www.home-assistant.io/docs/configuration/yaml/
Return to Post List