Crate css_parse

Expand description

An implementation of CSS Syntax Level 3, plus various additional traits and macros to assist in parsing. It is intended to be used to build CSS or CSS-alike languages (for example SASS), but isn’t able to parse the full CSS grammar itself. It relies on the foundational css_lexer crate.

This crate provides the Parser struct, which builds upon Lexer. It borrows a &str which it will parse to produce AST nodes (any type that implements the Parse and ToCursors traits). AST nodes should parse themselves and any children using recursive descent.

Parsing requires a heap allocator to allocate into, [bumpalo::Bump] being the allocator of choice. This needs to be created before parsing, the parser result will have a lifetime bound to the allocator.

The Parser may be configured with additional Features to allow for different parsing or lexing styles. All features supported by the Lexer are supported in the Parser also (for example enabling Feature::SingleLineComments will enable the css_lexer feature of the same name).

This crate provides some low level AST nodes that are likely to be common in any CSS-alike language, including the various base tokens (such as dimensions, and operators). These can be referred to via the T! macro, and each T! implements the necessary traits to be parsed as an AST node. For example T![DashedIdent] represents a CSS ident with two leading dashes, and can be parsed and decomposted into its constituent Token (or Cursor or Span).

Additionally some generic structs are available to implement the general-purpose parts of CSS Syntax, such as ComponentValues. More on that below in the section titled Generic AST Nodes.

Lastly, traits and macros are provided to implement various parsing algorithms to make common parsing operations easier, for example the ranged_feature macro makes it easy to build a node that implements the RangedFeature trait, a trait that provides an algorithm for parsing a media feature in a range context.

Downstream implementations will likely want to build their own AST nodes to represent specific cover grammars, for example implementing the @property rule or the width: property declaration. Here’s a small guide on what is required to build such nodes:

§AST Nodes

To use this as a library a set of AST nodes will need to be created, the root node (and ideally all nodes) need to implement Parse - which will be given a mutable reference to an active Parser. Each Node will likely be a collection of other Nodes, calling Parser::parse<T>() (where T is each child Node). Leaf Nodes will likely be wrappers around a single token (tip: use the T! nodes which cover all single token needs):

use css_parse::*;
struct MyProperty {
  ident: T![Ident],
  colon: T![Colon],
  dimension: T![Dimension],
}
impl<'a> Parse<'a> for MyProperty {
  fn parse<I>(p: &mut Parser<'a, I>) -> Result<Self>
  where
    I: Iterator<Item = Cursor> + Clone,
  {
    let ident = p.parse::<T![Ident]>()?;
    let colon = p.parse::<T![Colon]>()?;
    let dimension = p.parse::<T![Dimension]>()?;
    Ok(Self { ident, colon, dimension })
  }
}

AST nodes will also need to implement ToCursors - which is given an abstract CursorSink to put the cursors back into, in order, so that they can be built back up into the original source text. Implementing ToCursors allows for all manner of other useful downstream operations such as concatenation, transforms (e.g. minification) and so on.

use css_parse::*;
struct MyProperty {
  ident: T![Ident],
  colon: T![Colon],
  dimension: T![Dimension],
}
impl ToCursors for MyProperty {
  fn to_cursors(&self, s: &mut impl CursorSink) {
    s.append(self.ident.into());
    s.append(self.colon.into());
    s.append(self.dimension.into());
  }
}

Both Parse and ToCursors are the required trait implemenetations, but several more are also available and make the work of Parsing (or downstream analysis) easier…

§Peekable nodes

Everything that implements Parse is required to implement Parse::parse(), but gets Parse::try_parse() for free, which allows parent nodes to more easily branch by parsing a node, resetting during failure. Parse::try_parse() can be expensive though - parsing a Node is pretty much guaranteed to advance the Parser some number of tokens forward, and so a parser checkpoint needs to be stored so that - should Parse::parse() fail - the Parser can be rewound to that checkpoint as if the operation never happened. Reading N tokens forward only to forget that and re-do it all over can be costly and is likely the wrong tool to use when faced with a set of branching Nodes with an ambiguity of which to parse. So Nodes are also encouraged to implement Peek, which their parent nodes can call to check as an indicator that this Node may viably parse.

Most nodes will know they can only accept a certain number of tokens, per their cover grammar. Peek is a useful way to encode this; Peek::peek gets an immutable reference to the Parser, from which it can call Parser::peek_n() (an immutable operation that can’t change the position of the parser) to look ahead to other tokens and establish if they would cause Parse::parse() to fail. There is still a cost to this, and so Peek::peek should only look ahead the smallest number of tokens to confidently know that it can begin parsing, rather than looking ahead a large number of tokens. For the most part peeking 1 or two tokens should be sufficient. An easy implementation for Peek is to simply set the Peek::PEEK_KINDSET const, which the provided implementation of Peek::peek() will use to check the cursor matches this KindSet.

use css_parse::*;
use {Kind, KindSet};
enum LengthOrAuto {
  Length(T![Dimension]), // A Dimension, like `px`
  Auto(T![Ident]),       // The Ident of `auto`
}
impl<'a> Peek<'a> for LengthOrAuto {
  const PEEK_KINDSET: KindSet = KindSet::new(&[Kind::Dimension, Kind::Ident]);
}

§Single token Nodes

If a node represents just a single token, for example a keyword, then its Parse implementation should call Parser::peek to check if it can be parsed, then Parser::next to get the cursor, and construct the node from that cursor. The Peek trait should accurately determine if the Node can be parsed from the given Cursor. Single token parsing may need to branch if it is an enum of variants:

use css_parse::*;
enum LengthOrAuto {
  Length(T![Dimension]), // A Dimension, like `px`
  Auto(T![Ident]),       // The Ident of `auto`
}
impl<'a> Peek<'a> for LengthOrAuto {
  const PEEK_KINDSET: KindSet = KindSet::new(&[Kind::Dimension, Kind::Ident]);
}
impl<'a> Parse<'a> for LengthOrAuto {
  fn parse<I>(p: &mut Parser<'a, I>) -> Result<Self>
  where
    I: Iterator<Item = Cursor> + Clone,
  {
    if p.peek::<T![Dimension]>() {
      p.parse::<T![Dimension]>().map(Self::Length)
    } else {
      p.parse::<T![Ident]>().map(Self::Auto)
    }
  }
}

§Convenience algorithms

For more complex algorithms where nodes might parse many child nodes or have some delicate or otherwise awkward steps, additional traits exist to make implementing AST nodes trivial for these use cases.

StyleSheet - AST nodes representing a stylesheet should use this to, well, parse a stylesheet.
Declaration - AST nodes representing a declaration (aka “property”) should use this to parse a declaration.
QualifiedRule - AST nodes representing a “Qualified Rule” (e.g. a style rule) should use this to parse a QualifiedRule.
CompoundSelector - AST nodes representing a CSS selector should use this to parse a list of nodes implementing SelectorComponent.
SelectorComponent - AST nodes representing an individual selector component, such as a tag or class or pseudo element, should use this to parse the set of specified selector components.

The *List traits are also available to more easily parse lists of things, such as preludes or blocks:

PreludeList - AST nodes representing a rule’s prelude should use this. It simply repeatedly parses its items until it enounters the start of a block (<{-token> or <;-token>).
FeatureConditionList - AST nodes representing a prelude “condition list” should use this. It parses the complex condition logic in rules like @media, @supports or @container.
DeclarationList - AST nodes representing a block which can only accept “Declarations” should use this. This is an implementation of <declaration-list>.
DeclarationRuleList - AST nodes representing a block which can accept either “At Rules” or “Declarations” but cannot accept “Qualified Rules” should use this. This is an implementation of <declaration-rule-list>
RuleList - AST nodes representing a block which can accept either “At Rules” or “Qualfiied Rules” but cannot accept “Declarations” should use this. This is an implementation of <rule-list>.

The *Feature traits are also available to more easily parse “features conditions”, these are the conditions supports in a FeatureConditionList, e.g. the conditions inside of @media, @container or @supports rules.

RangedFeature - AST nodes representing a feature condition in the “ranged” context.
BooleanFeature - AST nodes representing a feature condition in the “boolean” context.
DiscreteFeature - AST nodes representing a feature condition with discrete keywords.

§Generic AST nodes

In addition to the traits which allow for parsing bespoke AST Nodes, this crate provides a set of generic AST node structs/enums which are capable of providing “general purpose” AST nodes, useful for when an AST node fails to parse and needs to consume some tokens in a generic manner, according to the rules of :

syntax::QualifiedRule provides the generic <qualified-rule> grammar.
syntax::Declaration provides the generic <declaration> grammar.
syntax::BangImportant provides the <!important> grammar.
syntax::ComponentValue provides the <component-value> grammar, used by other generic nodes.
syntax::SimpleBlock provides the generic <simple-block> grammar.
syntax::FunctionBlock provides the generic <function-block> grammar.
syntax::ComponentValues provides a list of <component-value> nodes, per “parse a list of component values”.
syntax::BadDeclaration provides a struct to capture the bad declaration steps.

§Test Helpers

In order to make it much easier to test the functionality of AST nodes, enabling the testing feature will provide two testing macros which make setting up a test trivial.

assert_parse! will parse the given string against the given node, asserting that it parses successfully and can be written back out to the same output.
assert_parse_error! will parse the given string against the node, expecting the parse to fail.

It is advised to add the testing flag as a dev-dependencies feature to enable these only during test:

[dependencies]
css_parse = "*"

[dev-dependencies]
css_parse = { version = "*", features = ["testing"] }

§Example

A small example on how to define an AST node:

use css_parse::*;
#[derive(Debug)]
struct MyProperty {
  ident: T![Ident],
  colon: T![Colon],
  dimension: T![Dimension],
}
impl<'a> Parse<'a> for MyProperty {
  fn parse<I>(p: &mut Parser<'a, I>) -> Result<Self>
  where
    I: Iterator<Item = Cursor> + Clone,
  {
    let ident = p.parse::<T![Ident]>()?;
    let colon = p.parse::<T![Colon]>()?;
    let dimension = p.parse::<T![Dimension]>()?;
    Ok(Self { ident, colon, dimension })
  }
}
impl ToCursors for MyProperty {
  fn to_cursors(&self, s: &mut impl CursorSink) {
    self.ident.to_cursors(s);
    self.colon.to_cursors(s);
    self.dimension.to_cursors(s);
  }
}

assert_parse!(EmptyAtomSet::ATOMS, MyProperty, "width:1px");

Re-exports§

pub use syntax::*;

Modules§

syntax: Various structs/enums that represent generic AST nodes.
test_helpers: Test macros available if built with features = ["testing"]
token_macros: Various macros that expand to AST nodes that wrap Tokens.

Macros§

Optionals: A helper type for parsing optional CSS grammar patterns where items can appear in any order but at most once each (the || combinator in CSS grammar).
T: The T! macro expands to the name of a type representing the Token of the same name. These can be used in struct fields to type child nodes.
assert_parse: (Requires feature “testing”) Given a Node, and a string, this will expand to code that sets up a parser, and parses the given string against the given node. If the parse failed this macro will panic with a readable failure. It then writes the result out using crate::CursorWriteSink, writing the parsed Node back out to a string. If resulting string from the given string, then the macro will panic with a readable failure.
assert_parse_error: (Requires feature “testing”) Given a Node, and a string, this will expand to code that sets up a parser, and parses the given string against the given node. If the parse succeeded this macro will panic with a readable failure.
assert_parse_span: (Requires feature “testing”) Given a Node, and a multiline string, this will expand to code that sets up a parser, and parses the first line of the given string with the parser. It will then create a second string based on the span data and append it to the first line of the string, showing what was parsed and where the span rests.
boolean_feature: This macro expands to define an enum which already implements Parse and BooleanFeature, for a one-liner definition of a BooleanFeature.
custom_delim: A macro for defining a struct which captures a Kind::Delim with a specific character.
custom_double_delim: A macro for defining a struct which captures two adjacent Kind::Delim tokens, each with a specific character.
discrete_feature: This macro expands to define an enum which already implements Parse and DiscreteFeature, for a one-liner definition of a DiscreteFeature.
parse_optionals
pseudo_class: A macro for defining pseudo classes.
pseudo_element: A macro for defining pseudo elements.
ranged_feature: This macro expands to define an enum which already implements Parse and RangedFeature, for a one-liner definition of a RangedFeature.

Structs§

AssociatedWhitespaceRules: A [bitmask][bitmask_enum] representing rules around the whitespace surrounding a Kind::Delim token.
Cursor: Wraps Token with a SourceOffset, allows it to reason about the character data of the source text.
CursorCompactWriteSink: This is a CursorSink that wraps a sink (impl SourceCursorSink) and on each CursorSink::append() call, will write the contents of the cursor Cursor given into the given sink - using the given &'a str as the original source. Some tokens will not be output, and Whitespace tokens will always write out as a single ' '. It can be used as a light-weight minifier for ToCursors structs.
CursorInterleaveSink: This is a CursorSink that wraps a Sink (impl CursorSink) and a slice of Cursors to interleave. On each CursorSink::append() call, will append to the delegate sink, while also interleaving any of the Cursors from the slice of Cursors, in the right places.
CursorOrderedSink: This is a CursorSink that buffers cursors and emits them in source order. It uses contiguous coverage tracking to eagerly emit cursors as soon as gaps are filled.
CursorOverlaySet
CursorOverlaySink: This is a CursorSink that wraps a SourceCursorSink, while also taking a CursorOverlaySet. As Cursors get appended into this sink, it will replay those to the underlying SourceCursorSink unless a CursorOverlaySet overlaps the Cursor’s span, at which point the overlay wil be replayed to the underlying SourceCursorSink. This Sink is useful for collecting new Cursors (say from an AST) to overlap (or, say, transform) the underlying base Cursors (read: AST). In other words, writing over the top of the source.
CursorPrettyWriteSink: This is a CursorSink that wraps a sink (impl SourceCursorSink) and on each CursorSink::append() call, will write the contents of the cursor Cursor given into the given Writer - using the given &'a str as the original source. This also attempts to write additional newlines and indentation into the Writer to create a more aesthetically pleasing output. It can be used as a light-weight formatter for ToCursors structs.
CursorToSourceCursorSink
CursorWriteSink: This is a CursorSink that wraps a Writer (impl fmt::Write) and on each CursorSink::append() call, will write the contents of the cursor Cursor given into the given Writer - using the given &'a str as the original source. This is useful as way to turn Cursors into Strings or u8s (or files or whatever else implements Write).
Diagnostic: An issue that occured during parse time.
DiagnosticMeta
Error: Core Diagnostic wrapper type.
Feature: A set of runtime feature flags which can be enabled individually or in combination, which will change the way Parser works.
KindSet: Match a token against one or more Kinds.
Optionals2
Optionals3
Optionals4
Optionals5
Parser
ParserCheckpoint: Represents a point during the Parser’s lifecycle; retaining state that can then be rewound.
ParserReturn
SourceCursor: Wraps Cursor with a str that represents the underlying character data for this cursor.
SourceOffset: Represents a position in the underlying source.
Span: Represents a range of text within a document, as a Start and End offset.
State
Token: An abstract representation of the chunk of the source text, retaining certain “facts” about the source.
Whitespace: A [bitmask][bitmask_enum] representing the characters that make up a Kind::Whitespace token.

Enums§

Comparison: This enum represents a set of comparison operators, used in Ranged Media Features (see RangedFeature), and could be used in other parts of a CSS-alike language. This isn’t a strictly standard part of CSS, but is provided for convenience.
EmptyAtomSet: This enum represents an empty AtomSet. It can be used to Lex code when you’re not interested in capturing known keywords.
Kind: Kind represents the token “Type”, categorised mostly by the token types within the CSS Syntax spec.
PairWise: Represents either the left or right Kind of a PairWise set.
QuoteStyle: An enum representing the “Style” the Kind::String token represents.
Severity

Traits§

AtomSet: Usage with #[derive(AtomSet)]
BooleanFeature: This trait provides an implementation for parsing a “Media Feature” in the “Boolean” context. This is complementary to the other media features: RangedFeature and DiscreteFeature.
CompoundSelector
CursorSink: This trait provides the generic impl that ToCursors can use. This provides just enough API surface for nodes to put the cursors they represent into some buffer which can later be read, the details of which are elided.
CursorSource: This trait provides the generic impl that ToCursors can use. This provides just enough API surface for nodes to put the cursors they represent into some buffer which can later be read, the details of which are elided.
DeclarationValue: A trait that can be used for AST nodes representing a Declaration’s Value. It offers some convenience functions for handling such values.
DiscreteFeature: This trait provides an implementation for parsing a “Media Feature” that has a discrete keyword. This is complementary to the other media features: BooleanFeature and DiscreteFeature.
FeatureConditionList: This trait can be used for AST nodes representing a list of “Feature Conditions”. This is an amalgamation of Supports Conditions, Media Conditions, and Container Queries This is an implementation of <at-rule-list>.
NodeMetadata: Aggregated metadata for nodes, that can propagate up a node tree.
NodeWithMetadata: A Node that has NodeMetadata
Parse: This trait allows AST nodes to construct themselves from a mutable Parser instance.
Peek: This trait allows AST nodes to indicate whether the Parser is in the right position to potentially Parse the node. Returning true from Peek is not a guarantee that a node will successfully parse, instead it offers an indication that the node can successfully parse the first node. This is useful for cheaply comparing a set of Nodes to see which one might viably parse, rather than calling Parser::try_parse() on each.
PreludeList
RangedFeature: This trait provides an implementation for parsing a “Media Feature” in the “Range” context.
RuleVariants: A trait that can be used for AST nodes representing a Declaration’s Value. It offers some convenience functions for handling such values.
SelectorComponent
SemanticEq: Trait for semantic equality comparison that ignores source positions and whitespace.
SourceCursorSink
StyleSheet: This trait provides an implementation for parsing a StyleSheet.
ToCursors: This trait allows AST nodes to decompose themselves back into a set of (ordered) Cursors.
ToNumberValue
ToSpan: A trait representing an object that can derive its own Span. This is very similar to From<MyStuct> for Span, however From<MyStruct> for Span requires Sized, meaning it is not dyn compatible.

Type Aliases§

Result

Crate css_parseCopy item path