Why Avro and Kafka Go Together
Apache Kafka moves enormous amounts of data between services. JSON works fine at small scale, but at millions of messages per second, you start hitting real problems: wasted bandwidth from repeating field names, no schema enforcement, and breaking changes that silently corrupt downstream consumers.
Apache Avro solves these. It's the most popular serialization format for Kafka — especially in the Confluent ecosystem — because it offers compact binary encoding, schema-enforced data, and built-in support for schema evolution without breaking existing consumers. If you're new to Avro itself, start with What is Apache Avro? before continuing here.
Smaller Messages
Binary encoding without field names. Avro messages are typically 50–70% smaller than equivalent JSON, reducing Kafka storage and network costs.
Schema Enforcement
Every message is validated against a schema at publish time. Malformed data never enters your Kafka topic.
Safe Evolution
Add or remove fields without breaking existing consumers. Schema Registry enforces compatibility rules before any change goes live.
The Problem with JSON in Kafka
Here's a concrete example of why JSON causes pain at scale:
// JSON message — every record repeats field names
{"userId": 12345, "event": "purchase", "amount": 99.99, "currency": "USD"}
{"userId": 67890, "event": "purchase", "amount": 24.99, "currency": "USD"}
// At 10 million messages/day: ~100MB just for field name repetition// Avro message — schema stored once in Schema Registry, data is binary // Same two records: ~40% smaller, type-safe, schema-validated
With Avro, the schema lives once in Schema Registry. Each Kafka message carries only a small schema ID (4 bytes) and the binary-encoded field values. Consumers look up the schema ID to deserialize — no field names in every message.
What is Schema Registry?
Confluent Schema Registry is a centralized service that stores and manages Avro schemas for your Kafka topics. Think of it as a schema version control system.
Producer registers schema
The Avro serializer sends the schema to Schema Registry on first use. If compatible with previous versions, it gets a numeric ID (e.g., id=3).
Message is published with schema ID
Each Kafka message starts with a magic byte (0x00), 4-byte schema ID, then binary Avro data. No schema embedded in the message.
Consumer fetches schema and deserializes
The Avro deserializer reads the schema ID, fetches the schema from Registry (cached after first call), and deserializes the binary data back to a record.
The key benefit: Schema Registry enforces compatibility rules. If you try to register a schema that breaks existing consumers, the registry rejects it before it reaches Kafka.
# Register a schema via REST API
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data '{"schema": "{"type":"record","name":"User","fields":[{"name":"id","type":"int"},{"name":"name","type":"string"}]}"}' \
http://localhost:8081/subjects/users-value/versions
# Response: {"id": 1}Schema Evolution in Kafka
Schema evolution is the ability to change your schema over time without breaking existing producers or consumers. It's one of Avro's biggest strengths — and a core reason why it pairs so well with Kafka. If you need a deep dive into schema types and structure first, the Avro Schema Guide covers all primitives, complex types, and field rules. Avro supports three compatibility levels:
| Compatibility | What it means | Safe upgrade order |
|---|---|---|
| Backward | New schema reads old data | Upgrade consumers first |
| Forward | Old schema reads new data | Upgrade producers first |
| Full | Both directions work | Upgrade in any order |
Safe vs Breaking Changes
✓ Safe changes
- • Add optional field with a default value
- • Remove a field that had a default value
- • Add a type to a union (with care)
- • Change field order (Avro uses names, not positions)
- • Add an alias to an existing field
✗ Breaking changes
- • Remove a required field (no default)
- • Add a required field without default
- • Change a field's type (e.g. int → string)
- • Rename a field without an alias
- • Change the record name or namespace
// Adding an optional field — backward AND forward compatible
{
"type": "record",
"name": "User",
"namespace": "com.example.avro",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "name", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "age", "type": ["null", "int"], "default": null }
]
}The new age field uses a union type ["null", "int"] with default: null. Old consumers that don't know about age will simply ignore it. New consumers reading old data will use the default null. For more ready-to-use schema patterns like this, see Avro Schema Examples.
Tip: Use our Avro Schema Compatibility Checker to verify backward, forward, and full compatibility between your schemas before deploying to Kafka.
Producer and Consumer Setup
Here's how to configure Kafka producers and consumers to use Avro serialization with Schema Registry. These examples use the Confluent Kafka Python client. Before wiring up producers and consumers, use the Avro Schema Validator to confirm your schema is valid, or the Avro Schema Generator to auto-generate one from your JSON data.
Kafka Producer (Python)
from confluent_kafka import Producer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroSerializer
from confluent_kafka.serialization import SerializationContext, MessageField
# Schema Registry client
schema_registry_conf = {'url': 'http://localhost:8081'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)
# Define your Avro schema
schema_str = """
{
"type": "record",
"name": "User",
"namespace": "com.example.avro",
"fields": [
{"name": "id", "type": "int"},
{"name": "name", "type": "string"},
{"name": "email", "type": "string"}
]
}
"""
avro_serializer = AvroSerializer(schema_registry_client, schema_str)
producer_conf = {
'bootstrap.servers': 'localhost:9092',
}
producer = Producer(producer_conf)
# Produce a message
user = {"id": 1, "name": "Alice", "email": "[email protected]"}
producer.produce(
topic='users',
value=avro_serializer(user, SerializationContext('users', MessageField.VALUE))
)
producer.flush()Kafka Consumer (Python)
from confluent_kafka import Consumer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.avro import AvroDeserializer
from confluent_kafka.serialization import SerializationContext, MessageField
schema_registry_conf = {'url': 'http://localhost:8081'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)
avro_deserializer = AvroDeserializer(schema_registry_client)
consumer_conf = {
'bootstrap.servers': 'localhost:9092',
'group.id': 'user-consumer-group',
'auto.offset.reset': 'earliest'
}
consumer = Consumer(consumer_conf)
consumer.subscribe(['users'])
while True:
msg = consumer.poll(1.0)
if msg is None:
continue
user = avro_deserializer(
msg.value(),
SerializationContext(msg.topic(), MessageField.VALUE)
)
print(f"Received: {user}")Setting Compatibility Level
You can set compatibility per subject (topic) or globally via the Schema Registry REST API:
# Set compatibility for a specific topic's value schema
curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data '{"compatibility": "FULL"}' \
http://localhost:8081/config/users-value
# Options: BACKWARD, BACKWARD_TRANSITIVE, FORWARD, FORWARD_TRANSITIVE, FULL, FULL_TRANSITIVE, NONERecommendation: Use FULL for new topics when possible. This gives you the most flexibility in rolling upgrades without coordination between teams.
Avro vs JSON vs Protobuf in Kafka
Quick comparison of the most common serialization choices for Kafka:
| Feature | Avro | JSON | Protobuf |
|---|---|---|---|
| Message size | Small (binary) | Large (text) | Smallest (binary) |
| Schema enforcement | ✓ Built-in | ✗ Manual | ✓ Built-in |
| Schema Registry support | Native | Basic | Supported |
| Schema evolution | Excellent | Manual | Good |
| Human readable | Schema yes, data no | Yes | Schema yes, data no |
| Code generation required | Optional | None | Yes |
| Kafka ecosystem fit | Best | Simple cases | Good |
Avro is the go-to for Kafka because Schema Registry was designed with Avro first. Protobuf is slightly more compact and works well too, especially if you already use it across services. JSON is fine for low-volume internal topics where readability matters more than efficiency. For a deeper comparison, see the Protobuf vs Avro guide.
Best Practices
Always use Schema Registry
Embedding schemas in messages wastes space and breaks versioning. Schema Registry is the correct approach for production Kafka with Avro.
Set compatibility to FULL for critical topics
Full compatibility allows rolling upgrades without coordination. Start strict and relax later if needed — it's harder to tighten after the fact.
Always add defaults to new fields
New fields without defaults break backward compatibility. Use ["null", "type"] unions with default: null for optional fields.
Use namespaces
Always set a namespace (e.g. com.example.events) to avoid name collisions between teams and services.
Check compatibility before deploying
Use our Avro Schema Compatibility Checker or the Schema Registry API /compatibility endpoint to test changes before they go live.
Free Avro Tools
Use these tools to work with Avro schemas in your Kafka projects:
Compatibility Checker
Check backward, forward, and full compatibility between schemas
Schema Validator
Validate Avro schema syntax before registering
Schema Generator
Auto-generate Avro schemas from JSON data
Avro Formatter
Format and beautify Avro schemas
JSON to Avro
Convert JSON data to Avro format
Avro Fixer
Fix and repair broken Avro schemas