The HTML DOM

Introduction

This article builds on the previous article about the DOM and explore the specificity of the HTML DOM.

HTML is a markup language similar to XML, though more lax in its syntax. It matches XML enough to leverage the DOM. However, unlike
XML which is eXtensible, HTML has a finite number of allowed tag. We’ll see how this affect implementation.

HTML tags can be separated in 2 categories :

  • The « standard » tags which include <div>, <span> or <p>, in fact the majority of the HTML tags. Those tags each have a distinctive semantic value, but don’t have any particular DOM API warranting a special implementation.
  • The « special » tags. Those include media tags such as <img>, <video>, <audio> but also tag with special attributes such as <meta>. As a rule of thumb any tag associated with attributes (like « src » for « img ») will require a specific implementation.

Implementation

https://github.com/silexlabs/Cocktail/tree/master/cocktail/core/html

The HTML implementation is in the « html » package, located in the « core » package.
This packages contains 2 main classes :

  • HTMLElement. This is the standard base class for each HTML node in your document. When the HTML document is parsed, it will create one instance of this class or one of its derived class for each HTML tag.
    This class is shared by all HTML node class, however it’s not abstract and it’s used for each « standard » tag (<div>, <span>…) as defined above. It’s also used as a default for unknown tags, the document will no longer be valid HTML but will still be displayed.
  • HTMLDocument. This is one of the main class in cocktail. It represents a full HTML document, and is one of the first instantiated class. It plays a major part in the startup phase.

The bulk of the rest of this package are classes inherited from HTMLElement for « special » tags. For instance HTMLImageElement is the implementation of <img> and add the code to load a picture when the « src » attribute is set.

Which HTMLElement derived class is instanciated for a given HTML tag is defined in HTMLDocument::createElement which switches the tag name and instanciate the right
class. Each time a new « special » tag is implemented, it is added to the switch.

HTMLElement

Let’s see what kind of tasks an HTMLElement performs.

  • It implements standard HTML methods and attributes not defined in the Node class of the DOM. For instance, attributes include ‘clientWidth’, ‘scrollTop’…
  • It exposes HTML IDL attributes such as ‘class’ or ‘id’. An IDL attribute is an attribute which can be defined either in the HTML document or via code. For instance the ‘class’ attribute of a node is typically first defined directly in the HTML then modified through scripting to a new value when the user interacts with the document.
  • It holds the style definition for the HTML element and is involved in the cascade. Style and cascading will be the subject of another article
  • It participates in the creation of the rendering tree. The rendering tree is used to actually render the document to the screen and it will be discussed in another article.

HTMLDocument

The HTMLDocument doses many thing (probably too much !). Here is a non-exhaustive list :

  • It holds reference to the whole HTML DOM tree and special elements such as the body.
  • It instantiate a lot of internal cocktail classes such as classes managing fonts or transition.
  • It receives user input (mouse or touch event for instance) and takes care of hit testing it for instance to find which node is currently hovered.

The « html » package and in particular the HTMLElment class is where a lot of concepts of cocktail meet (HTML DOM, CSS cascade, rendering tree) making it one of the
most central piece of code of the project.

Laissez un commentaire

News letter

Evénements Silex Labs sur Paris et sa région: Inscrivez vous à la Newsletter mensuelle

Silex Labs community Tweets

Facebook page