mark_or_node

Base classes for ADF data model.

This module provides the foundational classes for deserializing Atlassian Document Format (ADF) JSON into Python objects:

  • Base: Base class for all dataclasses

  • BaseMark: Base class for text marks (formatting)

  • BaseNode: Base class for document nodes

class atlas_doc_parser.mark_or_node.Base[source]

Base class for all ADF dataclasses.

Provides common functionality: - from_dict(): Deserialize from dictionary - to_dict(): Serialize to dictionary

classmethod get_fields() dict[str, Field][source]

Get the dict view of the dataclasses.Field in this class. It leverages the cache to avoid the overhead of dataclasses.fields function call.

to_dict() Dict[str, Any][source]

Convert the dataclass to a complete dictionary with all fields.

to_kwargs() Dict[str, Any][source]

Convert the dataclass to a dictionary suitable for function calls.

classmethod from_dict(dct: Dict[str, Any]) Base[source]

Construct an instance from a dictionary.

Only fields defined in the dataclass will be used.

This is a defensive programming practice: it ensures that only fields defined in the dataclass are used when constructing an instance from a dictionary. This is important because the Atlassian Document Format may introduce new fields over time, and if the library is outdated, unexpected fields could be present in the input data. By ignoring unknown fields, the code remains robust and avoids errors due to schema changes.

class atlas_doc_parser.mark_or_node.BaseMarkOrNode(type: str = <factory>)[source]

Base class for ADF marks and nodes.

is_type_of(expected_types: TypeEnum | list[TypeEnum]) bool[source]

Check if this element’s type matches one or more expected types.

Parameters:

expected_types – A single TypeEnum member or list of TypeEnum members to match against. If a list is provided, returns True if this element’s type matches ANY of the expected types.

Returns:

True if this element’s type matches (any of) the expected type(s).

Example:

>>> BaseNode(...).is_type_of(TypeEnum.paragraph)
True
>>> BaseMark(...).is_type_of([TypeEnum.strong, TypeEnum.em])
True
class atlas_doc_parser.mark_or_node.BaseMark(type: str = <factory>)[source]

Base class for ADF marks (text formatting).

Marks represent formatting applied to text nodes, such as: - strong (bold) - em (italic) - link (hyperlink) - code (inline code)

Subclasses should override to_markdown() to provide format conversion.

classmethod from_dict(dct: Dict[str, Any]) T_MARK[source]

Deserialize from dictionary.

Handles nested attrs deserialization if the subclass defines an attrs field with a type that has from_dict().

to_dict() Dict[str, Any][source]

Serialize to dictionary, handling nested attrs.

to_markdown(text: str) str[source]

Apply this mark’s formatting to text.

The default implementation returns the input text unchanged. This design reflects the library’s philosophy:

  1. Content extraction over formatting. In the AI era, we convert ADF to Markdown primarily to extract textual content for LLMs, RAG systems, and knowledge bases. Preserving formatting is secondary to preserving content.

  2. When in doubt, preserve content without formatting. If a mark type doesn’t have a standard Markdown equivalent (e.g., background color), we return the raw text rather than inventing custom syntax or losing the content entirely.

  3. Use native Markdown only. We prefer standard Markdown syntax (**bold**, *italic*). Dialect-specific extensions are avoided.

Subclasses override this method to apply formatting. For example, MarkStrong.to_markdown("text") returns "**text**".

Parameters:

text – The text content to format.

Returns:

The formatted text. Default returns text unchanged.

class atlas_doc_parser.mark_or_node.BaseNode(type: str = <factory>)[source]

Base class for ADF nodes (document structure elements).

Nodes represent structural elements of the document, such as: - Block nodes: paragraph, heading, codeBlock, table - Inline nodes: text, mention, emoji

Nodes can contain: - attrs: Node-specific attributes - content: Child nodes (for container nodes) - marks: Text formatting (for inline nodes)

Subclasses should override to_markdown() to provide format conversion.

classmethod from_dict(dct: Dict[str, Any]) T_NODE[source]

Deserialize from dictionary.

Handles nested deserialization of: - attrs: Using the field type’s from_dict() - content: Using parse_node() for each child - marks: Using parse_mark() for each mark

Unimplemented node/mark types are gracefully skipped with an optional warning (controlled by settings.WARN_UNIMPLEMENTED_TYPE). Other parsing errors are propagated normally.

to_dict() Dict[str, Any][source]

Serialize to dictionary, handling nested attrs, content, and marks.

to_markdown(ignore_error: bool = False) str[source]

Convert this node to Markdown format.

The default implementation raises NotImplementedError. This is intentional for several reasons:

  1. Fail fast during development. When implementing new node types, we want to immediately discover which nodes haven’t implemented to_markdown() rather than silently producing empty output or skipping content. This helps catch missing implementations early.

  2. The ignore_error parameter provides an escape hatch. In production, if our code has bugs or a node type is partially implemented, users can pass ignore_error=True to gracefully skip nodes that fail to convert. This flag should be propagated recursively to all nested to_markdown() calls via helper functions like content_to_markdown().

  3. Error handling is explicit. The library user decides whether to fail fast (for debugging and development) or degrade gracefully (for production use cases where partial output is acceptable).

Subclasses must override this method to provide actual conversion logic.

Parameters:

ignore_error – If True, errors in nested conversions are silently skipped. If False (default), errors propagate immediately. This flag should be passed down to any nested to_markdown() calls.

Returns:

The Markdown representation of this node.

Raises:

NotImplementedError – Always raised by the base class to ensure subclasses implement this method.