Expand description
An implementation of CSS Syntax Level 3, plus various additional traits and macros to assist in parsing. It is intended to be used to build CSS or CSS-alike languages (for example SASS), but isn’t able to parse the full CSS grammar itself. It relies on the foundational [css_lexer] crate.
This crate provides the Parser struct, which builds upon [Lexer][css_lexer::Lexer]. It borrows a &str
which it
will parse to produce AST nodes (any type that implements the Parse and ToCursors traits). AST nodes should
parse themselves and any children using recursive descent.
Parsing requires a heap allocator to allocate into, [bumpalo::Bump] being the allocator of choice. This needs to be created before parsing, the parser result will have a lifetime bound to the allocator.
The Parser may be configured with additional Features to allow for different parsing or lexing styles. All features supported by the [Lexer][css_lexer::Lexer] are supported in the Parser also (for example enabling Feature::SingleLineComments will enable [the css_lexer feature of the same name][css_lexer::Feature::SingleLineComments]).
This crate provides some low level AST nodes that are likely to be common in any CSS-alike language, including the various base tokens (such as dimensions, and operators). These can be referred to via the T! macro, and each T! implements the necessary traits to be parsed as an AST node. For example T![DashedIdent] represents a CSS ident with two leading dashes, and can be parsed and decomposted into its constituent Token (or Cursor or Span).
Additionally some generic structs are available to implement the general-purpose parts of CSS Syntax, such as ComponentValues. More on that below in the section titled Generic AST Nodes.
Lastly, traits and macros are provided to implement various parsing algorithms to make common parsing operations easier, for example the ranged_feature macro makes it easy to build a node that implements the RangedFeature trait, a trait that provides an algorithm for parsing a media feature in a range context.
Downstream implementations will likely want to build their own AST nodes to represent specific cover grammars, for
example implementing the @property
rule or the width:
property declaration. Here’s a small guide on what is
required to build such nodes:
§AST Nodes
To use this as a library a set of AST nodes will need to be created, the root node (and ideally all nodes) need to
implement Parse - which will be given a mutable reference to an active Parser. Each Node will likely be a
collection of other Nodes, calling Parser::parse<T>() (where T
is each child Node). Leaf Nodes will likely be
wrappers around a single token (tip: use the T! nodes which cover all single token needs):
use css_parse::*;
struct MyProperty {
ident: T![Ident],
colon: T![Colon],
dimension: T![Dimension],
}
impl<'a> Parse<'a> for MyProperty {
fn parse(p: &mut Parser<'a>) -> Result<Self> {
let ident = p.parse::<T![Ident]>()?;
let colon = p.parse::<T![Colon]>()?;
let dimension = p.parse::<T![Dimension]>()?;
Ok(Self { ident, colon, dimension })
}
}
AST nodes will also need to implement ToCursors - which is given an abstract CursorSink to put the cursors back into, in order, so that they can be built back up into the original source text. Implementing ToCursors allows for all manner of other useful downstream operations such as concatenation, transforms (e.g. minification) and so on.
use css_parse::*;
struct MyProperty {
ident: T![Ident],
colon: T![Colon],
dimension: T![Dimension],
}
impl ToCursors for MyProperty {
fn to_cursors(&self, s: &mut impl CursorSink) {
s.append(self.ident.into());
s.append(self.colon.into());
s.append(self.dimension.into());
}
}
Both Parse and ToCursors are the required trait implemenetations, but several more are also available and make the work of Parsing (or downstream analysis) easier…
§Peekable nodes
Everything that implements Parse is required to implement Parse::parse(), but gets Parse::try_parse() for free, which allows parent nodes to more easily branch by parsing a node, resetting during failure. Parse::try_parse() can be expensive though - parsing a Node is pretty much guaranteed to advance the Parser some number of tokens forward, and so a parser checkpoint needs to be stored so that - should Parse::parse() fail - the Parser can be rewound to that checkpoint as if the operation never happened. Reading N tokens forward only to forget that and re-do it all over can be costly and is likely the wrong tool to use when faced with a set of branching Nodes with an ambiguity of which to parse. So Nodes are also encouraged to implement Peek, which their parent nodes can call to check as an indicator that this Node may viably parse.
Most nodes will know they can only accept a certain number of tokens, per their cover grammar. Peek is a useful way to encode this; Peek::peek gets an immutable reference to the Parser, from which it can call Parser::peek_n() (an immutable operation that can’t change the position of the parser) to look ahead to other tokens and establish if they would cause Parse::parse() to fail. There is still a cost to this, and so Peek::peek should only look ahead the smallest number of tokens to confidently know that it can begin parsing, rather than looking ahead a large number of tokens. For the most part peeking 1 or two tokens should be sufficient. An easy implementation for Peek is to simply set the Peek::PEEK_KINDSET const, which the provided implementation of Peek::peek() will use to check the cursor matches this KindSet.
use css_parse::*;
use {Kind, KindSet};
enum LengthOrAuto {
Length(T![Dimension]), // A Dimension, like `px`
Auto(T![Ident]), // The Ident of `auto`
}
impl<'a> Peek<'a> for LengthOrAuto {
const PEEK_KINDSET: KindSet = KindSet::new(&[Kind::Dimension, Kind::Ident]);
}
§Single token Nodes
If a node represents just a single token, for example a keyword, then it can implement the Build trait instead of
Parse. If it implements Build and Peek, it gets Parse for free. The Build trait is given an immutable
reference to the Parser, and the single Cursor it intends to build, and should simply return
Self
, wrapping the Cursor. The Peek trait should accurately and completely determines if
the Node is able to be built from the given Cursor, therefore making Build infallable;
Build can skip any of the checks that Peek already did, but may still need to branch if it is an enum of
variants:
use css_parse::*;
enum LengthOrAuto {
Length(T![Dimension]), // A Dimension, like `px`
Auto(T![Ident]), // The Ident of `auto`
}
impl<'a> Peek<'a> for LengthOrAuto {
const PEEK_KINDSET: KindSet = KindSet::new(&[Kind::Dimension, Kind::Ident]);
}
impl<'a> Build<'a> for LengthOrAuto {
fn build(p: &Parser<'a>, c: Cursor) -> Self {
if c == Kind::Dimension {
Self::Length(<T![Dimension]>::build(p, c))
} else {
Self::Auto(<T![Ident]>::build(p, c))
}
}
}
§Convenience algorithms
For more complex algorithms where nodes might parse many child nodes or have some delicate or otherwise awkward steps, additional traits exist to make implementing AST nodes trivial for these use cases.
- StyleSheet - AST nodes representing a stylesheet should use this to, well, parse a stylesheet.
- Declaration - AST nodes representing a declaration (aka “property”) should use this to parse a declaration.
- AtRule - AST nodes representing any At Rule should use use this to parse an AtRule.
- QualifiedRule - AST nodes representing a “Qualified Rule” (e.g. a style rule) should use this to parse a QualifiedRule.
- CompoundSelector - AST nodes representing a CSS selector should use this to parse a list of nodes implementing SelectorComponent.
- SelectorComponent - AST nodes representing an individual selector component, such as a tag or class or pseudo element, should use this to parse the set of specified selector components.
The *List
traits are also available to more easily parse lists of things, such as preludes or blocks:
- PreludeList - AST nodes representing a rule’s prelude should use this. It simply repeatedly parses its items until it enounters the start of a block (<{-token> or <;-token>).
- FeatureConditionList - AST nodes representing a prelude “condition list” should use this. It parses the complex
condition logic in rules like
@media
,@supports
or@container
. - DeclarationList - AST nodes representing a block which can only accept “Declarations” should use this. This is
an implementation of
<declaration-list>
. - DeclarationRuleList - AST nodes representing a block which can accept either “At Rules” or “Declarations” but
cannot accept “Qualified Rules” should use this. This is an implementation of
<declaration-rule-list>
- RuleList - AST nodes representing a block which can accept either “At Rules” or “Qualfiied Rules” but cannot
accept “Declarations” should use this. This is an implementation of
<rule-list>
.
The *Feature
traits are also available to more easily parse “features conditions”, these are the conditions
supports in a FeatureConditionList, e.g. the conditions inside of @media
, @container
or @supports
rules.
- RangedFeature - AST nodes representing a feature condition in the “ranged” context.
- BooleanFeature - AST nodes representing a feature condition in the “boolean” context.
- DiscreteFeature - AST nodes representing a feature condition with discrete keywords.
§Generic AST nodes
In addition to the traits which allow for parsing bespoke AST Nodes, this crate provides a set of generic AST node structs/enums which are capable of providing “general purpose” AST nodes, useful for when an AST node fails to parse and needs to consume some tokens in a generic manner, according to the rules of :
- syntax::AtRule provides the generic
<at-rule>
grammar. - syntax::QualifiedRule provides the generic
<qualified-rule>
grammar. - syntax::Declaration provides the generic
<declaration>
grammar. - syntax::BangImportant provides the
<!important>
grammar. - syntax::ComponentValue provides the
<component-value>
grammar, used by other generic nodes. - syntax::SimpleBlock provides the generic
<simple-block>
grammar. - syntax::FunctionBlock provides the generic
<function-block>
grammar. - syntax::ComponentValues provides a list of
<component-value>
nodes, per “parse a list of component values”. - syntax::BadDeclaration provides a struct to capture the bad declaration steps.
§Test Helpers
In order to make it much easier to test the functionality of AST nodes, enabling the testing
feature will provide
two testing macros which make setting up a test trivial.
-
assert_parse! will parse the given string against the given node, asserting that it parses successfully and can be written back out to the same output.
-
assert_parse_error! will parse the given string against the node, expecting the parse to fail.
It is advised to add the testing
flag as a dev-dependencies
feature to enable these only during test:
[dependencies]
css_parse = "*"
[dev-dependencies]
css_parse = { version = "*", features = ["testing"] }
§Example
A small example on how to define an AST node:
use css_parse::*;
#[derive(Debug)]
struct MyProperty {
ident: T![Ident],
colon: T![Colon],
dimension: T![Dimension],
}
impl<'a> Parse<'a> for MyProperty {
fn parse(p: &mut Parser<'a>) -> Result<Self> {
let ident = p.parse::<T![Ident]>()?;
let colon = p.parse::<T![Colon]>()?;
let dimension = p.parse::<T![Dimension]>()?;
Ok(Self { ident, colon, dimension })
}
}
impl ToCursors for MyProperty {
fn to_cursors(&self, s: &mut impl CursorSink) {
self.ident.to_cursors(s);
self.colon.to_cursors(s);
self.dimension.to_cursors(s);
}
}
assert_parse!(MyProperty, "width:1px");
Re-exports§
pub use syntax::*;
Modules§
- syntax
- Various structs/enums that represent generic AST nodes.
- test_
helpers - Test macros available if built with
features = ["testing"]
- token_
macros - Various macros that expand to AST nodes that wrap Tokens.
Macros§
- Optionals
- A helper type for parsing optional CSS grammar patterns where items can appear in any order
but at most once each (the
||
combinator in CSS grammar). - T
- The T! macro expands to the name of a type representing the Token of the same name. These can be used in struct fields to type child nodes.
- assert_
parse - (Requires feature “testing”) Given a Node, and a string, this will expand to code that sets up a parser, and parses the given string against the given node. If the parse failed this macro will panic with a readable failure. It then writes the result out using crate::CursorWriteSink, writing the parsed Node back out to a string. If resulting string from the given string, then the macro will panic with a readable failure.
- assert_
parse_ error - (Requires feature “testing”) Given a Node, and a string, this will expand to code that sets up a parser, and parses the given string against the given node. If the parse succeeded this macro will panic with a readable failure.
- assert_
parse_ span - (Requires feature “testing”) Given a Node, and a multiline string, this will expand to code that sets up a parser, and parses the first line of the given string with the parser. It will then create a second string based on the span data and append it to the first line of the string, showing what was parsed and where the span rests.
- atkeyword_
set - A macro for defining an enum which captures a token with Kind::AtKeyword that matches one of the variant names in the enum.
- boolean_
feature - This macro expands to define an enum which already implements Parse and BooleanFeature, for a one-liner definition of a BooleanFeature.
- custom_
delim - A macro for defining a struct which captures a Kind::Delim with a specific character.
- custom_
double_ delim - A macro for defining a struct which captures two adjacent Kind::Delim tokens, each with a specific character.
- discrete_
feature - This macro expands to define an enum which already implements Parse and DiscreteFeature, for a one-liner definition of a DiscreteFeature.
- function_
set - A macro for defining an enum which captures a token with Kind::Function that matches one of the variant names in the enum.
- keyword_
set - A macro for defining an enum which captures a token with Kind::Ident that matches one of the variant names in the enum.
- parse
- A macro for easily calling the Parser and entirely parsing a string.
- parse_
optionals - pseudo_
class - A macro for defining pseudo classes.
- pseudo_
element - A macro for defining pseudo elements.
- ranged_
feature - This macro expands to define an enum which already implements Parse and RangedFeature, for a one-liner definition of a RangedFeature.
Structs§
- Associated
Whitespace Rules - A [bitmask][bitmask_enum] representing rules around the whitespace surrounding a Kind::Delim token.
- Cursor
- Wraps Token with a SourceOffset, allows it to reason about the character data of the source text.
- Cursor
Compact Write Sink - This is a CursorSink that wraps a sink (
impl SourceCursorSink
) and on each CursorSink::append() call, will write the contents of the cursor Cursor given into the given sink - using the given&'a str
as the original source. Some tokens will not be output, and Whitespace tokens will always write out as a single' '
. It can be used as a light-weight minifier for ToCursors structs. - Cursor
Overlay Set - Cursor
Overlay Sink - This is a CursorSink that wraps a SourceCursorSink, while also taking a CursorOverlaySet. As Cursors get appended into this sink, it will replay those to the underlying SourceCursorSink unless a CursorOverlaySet overlaps the Cursor’s span, at which point the overlay wil be replayed to the underlying SourceCursorSink. This Sink is useful for collecting new Cursors (say from an AST) to overlap (or, say, transform) the underlying base Cursors (read: AST). In other words, writing over the top of the source.
- Cursor
Pretty Write Sink - This is a CursorSink that wraps a sink (
impl SourceCursorSink
) and on each CursorSink::append() call, will write the contents of the cursor Cursor given into the given Writer - using the given&'a str
as the original source. This also attempts to write additional newlines and indentation into the Writer to create a more aesthetically pleasing output. It can be used as a light-weight formatter for ToCursors structs. - Cursor
ToSource Cursor Sink - Cursor
Write Sink - This is a CursorSink that wraps a Writer (
impl fmt::Write
) and on each CursorSink::append() call, will write the contents of the cursor Cursor given into the given Writer - using the given&'a str
as the original source. This is useful as way to turn Cursors into Strings or u8s (or files or whatever else implements Write). - Error
- Core Diagnostic wrapper type.
- Feature
- A set of runtime feature flags which can be enabled individually or in combination, which will change the way Parser works.
- KindSet
- Match a token against one or more Kinds.
- Optionals2
- Optionals3
- Optionals4
- Optionals5
- Parser
- Parser
Checkpoint - Represents a point during the Parser’s lifecycle; retaining state that can then be rewound.
- Parser
Return - Source
Cursor - Wraps Cursor with a str that represents the underlying character data for this cursor.
- Source
Offset - Represents a position in the underlying source.
- Span
- Represents a range of text within a document, as a Start and End offset.
- State
- Token
- An abstract representation of the chunk of the source text, retaining certain “facts” about the source.
- Whitespace
- A [bitmask][bitmask_enum] representing the characters that make up a Kind::Whitespace token.
Enums§
- Comparison
- This enum represents a set of comparison operators, used in Ranged Media Features (see RangedFeature), and could be used in other parts of a CSS-alike language. This isn’t a strictly standard part of CSS, but is provided for convenience.
- Condition
Keyword - Dimension
Unit - Represents a Kind::Dimension’s unit, if it is “known”: defined by the CSS grammar.
- Kind
- Kind represents the token “Type”, categorised mostly by the token types within the CSS Syntax spec.
- Pair
Wise - Represents either the left or right Kind of a PairWise set.
- Quote
Style - An enum representing the “Style” the Kind::String token represents.
Traits§
- Boolean
Feature - This trait provides an implementation for parsing a “Media Feature” in the “Boolean” context. This is complementary to the other media features: RangedFeature and DiscreteFeature.
- Build
- This trait allows AST nodes to construct themselves from a single Cursor from the Parser.
- Compound
Selector - Cursor
Sink - This trait provides the generic
impl
that ToCursors can use. This provides just enough API surface for nodes to put the cursors they represent into some buffer which can later be read, the details of which are elided. - Cursor
Source - This trait provides the generic
impl
that ToCursors can use. This provides just enough API surface for nodes to put the cursors they represent into some buffer which can later be read, the details of which are elided. - Declaration
Value - A trait that can be used for AST nodes representing a Declaration’s Value. It offers some convenience functions for handling such values.
- Discrete
Feature - This trait provides an implementation for parsing a “Media Feature” that has a discrete keyword. This is complementary to the other media features: BooleanFeature and DiscreteFeature.
- Feature
Condition List - This trait can be used for AST nodes representing a list of “Feature Conditions”. This is an amalgamation of
Supports Conditions, Media Conditions, and Container Queries
This is an implementation of
<at-rule-list>
. - Parse
- This trait allows AST nodes to construct themselves from a mutable Parser instance.
- Peek
- This trait allows AST nodes to indicate whether the Parser is in the right position to potentially
Parse the node. Returning
true
from Peek is not a guarantee that a node will successfully parse, instead it offers an indication that the node can successfully parse the first node. This is useful for cheaply comparing a set of Nodes to see which one might viably parse, rather than calling Parser::try_parse() on each. - Prelude
List - Ranged
Feature - This trait provides an implementation for parsing a “Media Feature” in the “Range” context.
- Ranged
Feature Keyword - Rule
Variants - A trait that can be used for AST nodes representing a Declaration’s Value. It offers some convenience functions for handling such values.
- Selector
Component - Source
Cursor Sink - Style
Sheet - This trait provides an implementation for parsing a StyleSheet.
- ToCursors
- This trait allows AST nodes to decompose themselves back into a set of (ordered) Cursors.
- ToNumber
Value - ToSpan
- A trait representing an object that can derive its own Span. This is very similar to
From<MyStuct> for Span
, howeverFrom<MyStruct> for Span
requiresSized
, meaning it is notdyn
compatible.
Type Aliases§
- Result
- type alias for
Result<T, Report>