The Data Model Architecture¶
This document explains the foundational classes that all ADF marks and nodes inherit from, and how they provide the core functionality for serialization, deserialization, and Markdown conversion.
One Definition, One Class¶
Each definition in the ADF JSON schema corresponds to:
One Python module (e.g.,
node_paragraph.py,mark_strong.py)One Python class (e.g.,
NodeParagraph,MarkStrong)
In special cases, multiple schema definitions may map to a single class when they represent variants of the same concept. For example, different media type variants might share the same NodeMedia class.
The class hierarchy is:
Base
└── BaseMarkOrNode
├── BaseMark # For text formatting (strong, em, link, etc.)
└── BaseNode # For document structure (paragraph, heading, etc.)
Every mark and node has a type field that identifies what kind of element it is
(e.g., "paragraph", "strong"). The valid type values are defined in the
type_enum module as the TypeEnum enumeration. When
implementing a new dataclass, always use TypeEnum.xxx.value for the type
field’s default value.
Class Hierarchy¶
Base Class¶
The Base class (in atlas_doc_parser.mark_or_node) provides common functionality for all ADF dataclasses:
Key Methods:
from_dict(dct): Construct an instance from a dictionary. Uses defensive programming to ignore unknown fields, ensuring forward compatibility when Atlassian adds new fields.to_dict(): Serialize the instance back to a dictionary, suitable for JSON encoding.get_fields(): Returns cached field information for performance.
BaseMark Class¶
The BaseMark class represents text formatting marks like bold, italic, links, etc.
from_dict() Behavior:
Marks have a simpler structure than nodes. The from_dict() method only needs to handle:
The
typefield (e.g.,"strong","em")Optional
attrsfield (e.g., link URL, text color)
It does NOT recursively parse nested content because marks don’t contain child nodes.
to_markdown() Behavior:
def to_markdown(self, text: str) -> str:
return text # Default: return text unchanged
The default implementation returns the input text unchanged. This design reflects that:
The primary goal of to_markdown is content extraction, not formatting. In the AI era, we’re converting ADF to Markdown primarily to extract textual content for LLMs, RAG systems, and knowledge bases. Preserving formatting is secondary.
When in doubt, preserve content without formatting. If a mark type doesn’t have a standard Markdown equivalent, we simply return the raw text rather than inventing custom syntax.
Use native Markdown only. We prefer standard Markdown syntax (e.g.,
**bold**,*italic*). Dialect-specific extensions like GitHub-flavored Markdown tables are avoided when possible.
Subclasses override this method to apply formatting. For example, MarkStrong.to_markdown("text") returns "**text**".
BaseNode Class¶
The BaseNode class represents document structure elements like paragraphs, headings, lists, tables, etc.
from_dict() Behavior:
Nodes have a more complex structure. The from_dict() method handles:
The
typefield (e.g.,"paragraph","heading")Optional
attrsfield (node-specific attributes)Optional
contentfield (child nodes) - recursively parsedOptional
marksfield (text formatting) - recursively parsed
The key difference from BaseMark.from_dict() is the recursive parsing of content and marks using parse_node() and parse_mark() respectively.
to_markdown() Behavior:
def to_markdown(self, ignore_error: bool = False) -> str:
raise NotImplementedError(
f"{self.__class__.__name__} has not implemented ``to_markdown()``"
)
The default implementation raises NotImplementedError. This is intentional:
Fail fast during development. When developing new node types, we want to immediately discover which nodes haven’t implemented
to_markdown()rather than silently producing empty output.The ignore_error parameter provides an escape hatch. In production, users can pass
ignore_error=Trueto gracefully skip nodes that fail to convert. This flag propagates recursively to all nested nodes.Error handling is explicit. The library user decides whether to fail fast (for debugging) or degrade gracefully (for production).
Subclasses must override this method. The ignore_error parameter should be passed down to any nested to_markdown() calls for consistent behavior.
Markdown Helper Functions¶
The atlas_doc_parser.markdown_helpers module provides utility functions used by node classes to implement to_markdown().
strip_double_empty_line()¶
def strip_double_empty_line(text: str, n: int = 3) -> str
Purpose: Remove excessive consecutive blank lines from the output.
When converting block-level nodes, many nodes add blank lines before/after themselves to ensure proper Markdown rendering. This can result in 3+ consecutive blank lines, which looks ugly. This function normalizes the output to at most one blank line (two newlines).
Usage: Called internally by doc_content_to_markdown() to clean up the final output.
content_to_markdown()¶
def content_to_markdown(
content: list[T_NODE],
concat: str = "",
ignore_error: bool = False,
) -> str
Purpose: Recursively convert a list of child nodes to concatenated Markdown.
This is the workhorse function for converting nested content. It:
Iterates through all child nodes
Calls
to_markdown()on eachConcatenates the results
Usage: Used by most nodes that have a content field (paragraphs, headings, list items, etc.).
Example:
# In NodeParagraph.to_markdown():
md = content_to_markdown(self.content, ignore_error=ignore_error)
doc_content_to_markdown()¶
def doc_content_to_markdown(
content: list[T_NODE],
concat: str = "\n",
ignore_error: bool = False,
) -> str
Purpose: Convert document-level (root) content to Markdown with proper block separation.
This function is specifically for the NodeDoc root node. It differs from content_to_markdown() in that:
Joins blocks with newlines (not empty string)
Adds extra blank lines around lists and code blocks for proper rendering
Cleans up excessive blank lines in the final output
Usage: Called by NodeDoc.to_markdown() only.
add_style_to_markdown()¶
def add_style_to_markdown(md: str, node: T_NODE) -> str
Purpose: Apply mark formatting to Markdown text.
When a node has marks (e.g., bold + italic text), this function applies each mark’s formatting in sequence:
Start with plain text:
"hello"Apply strong mark:
"**hello**"Apply em mark:
"***hello***"
Usage: Used by nodes that support text formatting (NodeText, NodeHeading, etc.).
Example:
# In NodeText.to_markdown():
md = self.text
md = add_style_to_markdown(md, self)
return md
Summary¶
The base classes provide a consistent foundation for all ADF types:
Feature |
BaseMark |
BaseNode |
|---|---|---|
from_dict() |
Parses |
Parses |
to_dict() |
Serializes all fields |
Serializes all fields (recursive) |
to_markdown() |
Returns text unchanged (subclasses add formatting) |
Raises NotImplementedError (subclasses must implement) |
ignore_error |
N/A |
Propagates to nested calls for graceful degradation |
The helper functions handle common patterns so that individual node/mark implementations can focus on their specific conversion logic.