Updating data structures in a live production environment is one of the most dangerous tasks in data engineering. When you change a field in a Kafka topic, you risk crashing every downstream application that relies on that data. This challenge is known as schema evolution in Avro, and it is a central component of modern data governance.
By using the Confluent Schema Registry, you can enforce strict compatibility rules that act as a safety net. This ensures that producers cannot publish "poison pills"—messages that follow a new schema that older consumers cannot understand. In this guide, you will learn how to configure your registry, choose the right compatibility levels, and update your Avro schemas without causing a single second of downtime.
TL;DR — To evolve schemas safely, always provide default values for new fields, use the BACKWARD or FULL compatibility modes in the Confluent Schema Registry, and validate your schema changes against the registry using build-time plugins before deploying code.
The Core Concept of Schema Evolution
Apache Avro is a binary serialization format that relies on schemas defined in JSON. Unlike JSON or XML, Avro does not store field names within the data records themselves to save space. Instead, it stores a fingerprint or ID that points to a specific schema version. This makes the Confluent Schema Registry an essential piece of infrastructure; it stores the history of these schemas and provides them to consumers at runtime.
Schema evolution is the process of modifying the schema while maintaining the ability for producers and consumers to communicate. This is achieved through Compatibility Rules. The registry checks every new schema version against previous versions. If the new schema violates the set rule (e.g., deleting a required field in a BACKWARD compatible mode), the registry rejects the update, preventing the producer from breaking the pipeline.
When to Evolve Your Schema
You typically face schema evolution in three real-world scenarios. First, when adding new business requirements, such as capturing a user's middle name or a new tracking ID. In this case, you add a field. If you do this correctly by providing a default value, old consumers will simply ignore the new field, while new consumers can start using it immediately.
Second, schema evolution occurs during data cleanup. You might want to remove a deprecated field that is no longer being populated. This is more dangerous because if a consumer still expects that field to exist and you stop sending it, the consumer will crash. Using FORWARD compatibility ensures that the data produced with the new schema can still be read by consumers using the previous schema version.
Third, you might need to change a data type, such as moving from an int to a long to support larger IDs. This is the most complex type of evolution and often requires FULL compatibility. In high-scale environments using Confluent Platform 7.5 or later, enforcing these rules at the topic level is the standard way to ensure that hundreds of microservices don't break simultaneously when a shared data model changes.
Step-by-Step Implementation
Step 1: Set the Compatibility Level
By default, the Confluent Schema Registry is often set to BACKWARD. However, for most enterprise pipelines, FULL_TRANSITIVE is the safest choice as it ensures the new schema is compatible with all previous versions, not just the last one. Use the following CURL command to set the compatibility for a specific topic (subject):
curl --request PUT \
--url http://localhost:8081/config/my-topic-value \
--header 'Content-Type: application/vnd.schemaregistry.v1+json' \
--data '{
"compatibility": "FULL_TRANSITIVE"
}'
Step 2: Define the New Avro Schema
When adding a field, you must include a default value. Without a default value, the registry will reject the schema under BACKWARD or FULL rules because old versions of the code wouldn't know how to handle the missing data in new records. Here is an example of an evolved user schema:
{
"type": "record",
"name": "User",
"namespace": "com.example",
"fields": [
{"name": "id", "type": "int"},
{"name": "email", "type": "string"},
{
"name": "phone_number",
"type": ["null", "string"],
"default": null
}
]
}
Step 3: Validate and Register via Maven
Never register schemas manually in production. Use the kafka-schema-registry-maven-plugin to validate the schema during your CI/CD pipeline. This prevents "broken" schemas from ever reaching your registry. Add this to your pom.xml:
<plugin>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry-maven-plugin</artifactId>
<version>7.5.0</version>
<configuration>
<schemaRegistryUrls>
<param>http://localhost:8081</param>
</schemaRegistryUrls>
<subjects>
<my-topic-value>src/main/avro/user.avsc</my-topic-value>
</subjects>
</configuration>
<executions>
<execution>
<id>validate</id>
<phase>test</phase>
<goals>
<goal>test-compatibility</goal>
</goals>
</execution>
</executions>
</plugin>
Common Pitfalls and Fixes
ENUM type, any old consumer that encounters the new value will throw a SerializationException. Avro does not handle unknown enum symbols gracefully unless you use specific Avro 1.10+ features or use strings instead of enums.
Another frequent issue is renaming a field. In Avro, a rename is technically a "delete" of the old field and an "add" of the new field. This breaks BACKWARD compatibility. To fix this, you should use aliases. Aliases allow the reader to map the old field name to the new field name during deserialization. This maintains compatibility without duplicating data in the binary payload.
When using Union types (for example, a field that can be a string or null), the first type in the union must match the default value's type. If your default is null, the union must be ["null", "string"]. If you flip it to ["string", "null"], the schema will often fail compatibility checks because the default value cannot be resolved against the first member of the union.
Best Practices for Long-term Governance
To keep your data pipelines healthy as they scale, follow these three metric-backed tips. First, always prefer FULL_TRANSITIVE compatibility. While it is more restrictive, it guarantees that any consumer—regardless of how old its version is—can read any message in the topic. According to industry benchmarks, teams using Transitive compatibility reduce pipeline-related production incidents by up to 40%.
Second, implement a "Schema First" development workflow. Developers should commit the .avsc file and have it reviewed by a data architect before any producer code is written. This prevents the "accidental" creation of complex nested structures that are difficult to evolve later. For more on this, check out our guide on Data Governance Strategies for Kafka.
- Default values are mandatory for adding/removing fields in evolved schemas.
- Use the Confluent Maven plugin to catch compatibility errors in CI/CD.
- Prefer
BACKWARDcompatibility if you upgrade consumers first;FORWARDif you upgrade producers first. - Aliases are the only safe way to rename fields without breaking existing consumers.
Finally, monitor your Schema Registry metrics. Keep an eye on the number of unique schemas per subject. If you see a subject with hundreds of versions, it may indicate that your data model is too volatile and should be split into multiple topics. For further reading, see our article on Kafka Topic Naming and Design Best Practices.
Frequently Asked Questions
Q. What is the difference between BACKWARD and FORWARD compatibility?
A. BACKWARD compatibility means new code can read data written by old code (essential for upgrading consumers). FORWARD compatibility means old code can read data written by new code (essential for upgrading producers). FULL compatibility is both combined.
Q. Why does Avro require default values for schema evolution?
A. Avro uses the default value to fill in gaps when the reader's schema and the writer's schema don't match. If a reader expects a field that isn't in the binary record, it looks for the default in the schema. Without it, the process fails.
Q. Can I change a field type from string to int in Avro?
A. No, this is generally not a compatible change. To change a type, you should add a new field with the new type, produce to both for a transition period, and then deprecate the old field once all consumers are updated.
Post a Comment