Lexer
The Lexer class implements the Parser interface and provides tokenization and parsing utilities for schema code. It manages a dictionary of token definitions and handles the parsing process by matching patterns against the input code.
The Lexer class implements the Parser interface and provides tokenization and parsing utilities for schema code. It manages a dictionary of token definitions and handles the parsing process by matching patterns against the input code.
import { Lexer } from '@stackpress/idea-parser';
- Properties
- Methods
- Parsing Complex Data Structures
- Error Handling
- Predefined Token Definitions
- Usage with AST
- Advanced Usage Patterns
1. Properties
The following properties are available when instantiating a Lexer.
| Property | Type | Description |
|---|---|---|
index | number | Current parsing position in the code |
2. Methods
The following methods are available when instantiating a Lexer.
2.1. Cloning the Lexer
The following example shows how to create an exact copy of the lexer's current state.
import { Lexer } from '@stackpress/idea-parser';
const lexer = new Lexer();
lexer.load('enum Status { ACTIVE "Active" }');
lexer.define('enum', enumReader);
// Clone preserves code, index position, and all definitions
const clonedLexer = lexer.clone();
console.log(clonedLexer.index); // Same index as original
console.log(clonedLexer.dictionary); // Same definitions as original
Returns
A new Lexer instance with identical state to the original.
2.2. Defining Token Patterns
The following example shows how to register token definitions for parsing.
import { Lexer } from '@stackpress/idea-parser';
import definitions from '@stackpress/idea-parser/definitions';
const lexer = new Lexer();
// Load all predefined token definitions
Object.keys(definitions).forEach((key) => {
lexer.define(key, definitions[key]);
});
// Define a custom token
lexer.define('customKeyword', (code, start, lexer) => {
if (code.substring(start, start + 6) === 'custom') {
return {
type: 'Keyword',
value: 'custom',
start: start,
end: start + 6
};
}
return undefined;
});
Parameters
| Parameter | Type | Description |
|---|---|---|
reader | Reader | Function that attempts to match and parse the token |
Returns
Void (modifies the lexer's internal dictionary).
2.3. Expecting Specific Tokens
The following example shows how to require specific tokens at the current position.
import { Lexer } from '@stackpress/idea-parser';
import definitions, { data } from '@stackpress/idea-parser/definitions';
const lexer = new Lexer();
Object.keys(definitions).forEach((key) => {
lexer.define(key, definitions[key]);
});
// Parse a float literal
lexer.load('4.4');
const floatToken = lexer.expect('Float');
console.log(floatToken.value); // 4.4
console.log(floatToken.start); // 0
console.log(floatToken.end); // 3
// Parse any data type (scalar, object, or array)
lexer.load('"hello world"');
const stringToken = lexer.expect(data);
console.log(stringToken.value); // 'hello world'
// Expect one of multiple token types
lexer.load('true');
const booleanToken = lexer.expect(['Boolean', 'String', 'Integer']);
console.log(booleanToken.value); // true
Parameters
| Parameter | Type | Description |
|---|
Returns
The matched token object, or throws an exception if no match is found.
2.4. Getting Token Definitions
The following example shows how to retrieve registered token definitions.
import { Lexer } from '@stackpress/idea-parser';
const lexer = new Lexer();
lexer.define('String', definitions['String']);
const definition = lexer.get('String');
if (definition) {
console.log('Definition key:', definition.key); // 'String'
console.log('Reader function:', typeof definition.reader); // 'function'
} else {
console.log('Definition not found');
}
Parameters
| Parameter | Type | Description |
|---|
Returns
The Definition object if found, undefined otherwise.
2.5. Loading Code
The following example shows how to load code for parsing.
import { Lexer } from '@stackpress/idea-parser';
const lexer = new Lexer();
// Load code from the beginning
lexer.load('enum Status { ACTIVE "Active" }');
console.log(lexer.index); // 0
// Load code starting from a specific position
lexer.load('enum Status { ACTIVE "Active" }', 5);
console.log(lexer.index); // 5
console.log(lexer.substring(5, 11)); // 'Status'
Parameters
| Parameter | Type | Description |
|---|---|---|
index | number | Starting position in the code (default: 0) |
Returns
The Lexer instance to allow method chaining.
2.6. Matching Tokens
The following example shows how to find the first matching token from available definitions.
import { Lexer } from '@stackpress/idea-parser';
const lexer = new Lexer();
lexer.define('literal', (code, start) => {
if (code.startsWith('42', start)) {
return { type: 'Literal', value: 42, start, end: start + 2 };
}
return undefined;
});
const code = '42';
// Match against specific token types
const match = lexer.match(code, 0, ['literal']);
if (match) {
console.log('Matched:', match.type); // 'Literal'
console.log('Value:', match.value); // 42
}
// Match against all available definitions (pass undefined for keys)
const match2 = lexer.match('42', 0, undefined);
if (match2) {
console.log('Matched:', match2.type); // 'Literal'
}
Parameters
| Parameter | Type | Description |
|---|---|---|
start | number | Starting position for matching |
keys | string[] | Optional array of definition keys to try (default: all definitions) |
Returns
The first matching token object, or null if no match is found.
2.7. Testing for Next Tokens
The following example shows how to check if the next tokens match without consuming them.
import { Lexer } from '@stackpress/idea-parser';
import definitions from '@stackpress/idea-parser/definitions';
const lexer = new Lexer();
Object.keys(definitions).forEach((key) => {
lexer.define(key, definitions[key]);
});
lexer.load('enum Status { ACTIVE "Active" }');
// Test if next token matches (doesn't advance index)
if (lexer.next('AnyIdentifier')) {
console.log('Next token is an identifier');
console.log(lexer.index); // Still 0
}
// Test for multiple possible tokens
if (lexer.next(['String', 'Integer', 'Boolean'])) {
console.log('Next token is a literal value');
}
Parameters
| Parameter | Type | Description |
|---|
Returns
true if the next token(s) match, false otherwise. Does not advance the index.
2.8. Optional Token Parsing
The following example shows how to optionally parse tokens without throwing errors.
import { Lexer } from '@stackpress/idea-parser';
import definitions from '@stackpress/idea-parser/definitions';
const lexer = new Lexer();
Object.keys(definitions).forEach((key) => {
lexer.define(key, definitions[key]);
});
lexer.load('/* some comment */ enum Status');
// Try to parse optional whitespace/comments
const comment = lexer.optional('note');
if (comment) {
console.log('Found comment:', comment.value); // '/* some comment */'
}
// Skip optional whitespace
lexer.optional('whitespace');
// Now parse the enum keyword
const enumToken = lexer.expect('AnyIdentifier');
console.log(enumToken.name); // 'enum'
Parameters
| Parameter | Type | Description |
|---|
Returns
The matched token object if found, undefined otherwise.
2.9. Reading Ahead
The following example shows how to read the next available token.
import { Lexer } from '@stackpress/idea-parser';
const lexer = new Lexer();
lexer.load(''); // Empty string
// Read the next token (tries all definitions)
const nextToken = lexer.read();
console.log(nextToken); // undefined (no tokens in empty string)
lexer.load('42');
lexer.define('Integer', definitions['Integer']);
const token = lexer.read();
if (token) {
console.log('Token type:', token.type); // 'Literal'
console.log('Token value:', token.value); // 42
}
Returns
The next available token object, or undefined if no token can be parsed.
2.10. Getting Substrings
The following example shows how to extract portions of the source code.
import { Lexer } from '@stackpress/idea-parser';
const lexer = new Lexer();
lexer.load('some code');
// Extract specific portions of code
const substring = lexer.substring(5, 9);
console.log(substring); // 'code'
// Return empty string when start and end are the same
const empty = lexer.substring(5, 5);
console.log(empty); // ''
Parameters
| Parameter | Type | Description |
|---|---|---|
end | number | Ending position in the code |
Returns
The substring of code between start and end positions.
2.11. Finding Next Space
The following example shows how to find the next whitespace character (useful for error reporting).
import { Lexer } from '@stackpress/idea-parser';
const lexer = new Lexer();
lexer.load('enum Status { ACTIVE "Active" }');
// Find next space from current position
const spaceIndex = lexer.nextSpace();
console.log(spaceIndex); // 4 (position of space after 'enum')
// If no space found, returns code length
lexer.load('enumStatus');
const endIndex = lexer.nextSpace();
console.log(endIndex); // 10 (length of code)
Returns
The index of the next space character, or the code length if no space is found.
3. Parsing Complex Data Structures
The Lexer can parse complex nested data structures using the predefined token definitions:
3.1. Parsing Objects
The following examples demonstrate how to parse object literals using the Lexer. Objects in the schema language use key-value pairs without colons, where keys are identifiers and values can be any data type.
import { Lexer } from '@stackpress/idea-parser';
import { Compiler } from '@stackpress/idea-parser';
import definitions, { data } from '@stackpress/idea-parser/definitions';
const lexer = new Lexer();
Object.keys(definitions).forEach((key) => {
lexer.define(key, definitions[key]);
});
// Parse a simple object
lexer.load('{ foo "bar" bar 4.4 }');
const objectToken = lexer.expect('Object');
const compiled = Compiler.object(objectToken);
console.log(compiled); // { foo: 'bar', bar: 4.4 }
// Parse nested objects
lexer.load('{ foo "bar" zoo { foo false bar null } }');
const nestedToken = lexer.expect('Object');
const nestedCompiled = Compiler.object(nestedToken);
console.log(nestedCompiled.zoo.foo); // false
console.log(nestedCompiled.zoo.bar); // null
3.2. Parsing Arrays
// Parse a simple array
lexer.load('[ 4.4 "bar" false null ]');
const arrayToken = lexer.expect('Array');
const compiledArray = Compiler.array(arrayToken);
console.log(compiledArray); // [4.4, 'bar', false, null]
// Parse nested arrays
lexer.load('[ 4.4 "bar" [ 4 true ] ]');
const nestedArrayToken = lexer.expect('Array');
const nestedArray = Compiler.array(nestedArrayToken);
console.log(nestedArray[2]); // [4, true]
// Parse array of objects
lexer.load('[ { label "US" value "United States" } { label "CA" value "Canada" } ]');
const objectArrayToken = lexer.expect('Array');
const objectArray = Compiler.array(objectArrayToken);
console.log(objectArray[0].label); // 'US'
console.log(objectArray[1].value); // 'Canada'
3.3. Parsing Comments
The following examples show how to parse different types of comments in the schema language. Comments can be either single-line comments starting with // or multi-line block comments enclosed in /* */.
// Parse block comments
lexer.load('/* some comment */');
const noteToken = lexer.expect('note');
console.log(noteToken.type); // '_Note'
console.log(noteToken.value); // '/* some comment */'
// Parse line comments
lexer.load('//some comment');
const commentToken = lexer.expect('comment');
console.log(commentToken.type); // '_Comment'
console.log(commentToken.value); // '//some comment'
// Parse multiline block comments
lexer.load(`/*
some
// comment
*/`);
const multilineToken = lexer.expect('note');
console.log(multilineToken.value); // Contains newlines and nested //
4. Error Handling
The Lexer provides detailed error information when parsing fails:
Unknown Definitions
const lexer = new Lexer();
// Throws: "Unknown definition unknownKey"
try {
lexer.expect('unknownKey');
} catch (error) {
console.log(error.message); // "Unknown definition unknownKey"
}
// Throws: "Unknown definition missingKey"
try {
lexer.match('some code', 0, ['missingKey']);
} catch (error) {
console.log(error.message); // "Unknown definition missingKey"
}
Unexpected Tokens
import { Exception } from '@stackpress/idea-parser';
const lexer = new Lexer();
lexer.define('String', definitions['String']);
lexer.load('42'); // Number, not string
try {
lexer.expect('String'); // Expecting string but found number
} catch (error) {
if (error instanceof Exception) {
console.log(error.message); // "Unexpected 42 expecting String"
console.log(error.start); // Position where error occurred
console.log(error.end); // End position for error highlighting
}
}
5. Predefined Token Definitions
The library comes with comprehensive predefined token definitions:
Literals
Null: MatchesnullkeywordBoolean: MatchestrueandfalseString: Matches quoted strings"hello"Float: Matches decimal numbers4.4,-3.14Integer: Matches whole numbers42,-10Environment: Matches environment variablesenv("VAR_NAME")
Identifiers
AnyIdentifier: Matches any valid identifier[a-z_][a-z0-9_]*UpperIdentifier: Matches uppercase identifiers[A-Z_][A-Z0-9_]*CapitalIdentifier: Matches capitalized identifiers[A-Z_][a-zA-Z0-9_]*CamelIdentifier: Matches camelCase identifiers[a-z_][a-zA-Z0-9_]*LowerIdentifier: Matches lowercase identifiers[a-z_][a-z0-9_]*AttributeIdentifier: Matches attribute identifiers@field.input
Structural
Array: Matches array expressions[ ... ]Object: Matches object expressions{ ... }{,},[,],(,): Bracket and brace tokens!: Final modifier token
Whitespace and Comments
whitespace: Matches any whitespace\s+space: Matches spaces onlyline: Matches line breaksnote: Matches block comments/* ... */comment: Matches line comments// ...
Data Groups
scalar: Array of scalar types['Null', 'Boolean', 'String', 'Float', 'Integer', 'Environment']data: Array of all data types[...scalar, 'Object', 'Array']
6. Usage with AST
The Lexer is typically used by AST classes for parsing specific language constructs:
import { Lexer, EnumTree } from '@stackpress/idea-parser';
// AST classes configure lexers with appropriate definitions
const lexer = new Lexer();
EnumTree.definitions(lexer); // Adds enum-specific token definitions
lexer.load('enum Status { ACTIVE "Active" INACTIVE "Inactive" }');
const enumTree = new EnumTree(lexer);
const result = enumTree.enum(); // Parse enum using configured lexer
7. Advanced Usage Patterns
The following patterns demonstrate advanced techniques for using the Lexer in complex parsing scenarios. These patterns are useful for implementing sophisticated parsing logic and error recovery mechanisms.
7.1. Backtracking with Clone
Backtracking allows you to save the lexer's state and restore it if parsing fails. This is useful for trying alternative parsing strategies without losing your position in the code.
// Save current state for potential backtracking
const checkpoint = lexer.clone();
try {
// Try to parse complex structure
const result = parseComplexStructure(lexer);
return result;
} catch (error) {
// Restore state and try alternative parsing
const restoredLexer = checkpoint;
return parseAlternativeStructure(restoredLexer);
}
7.2. Conditional Parsing
Conditional parsing allows you to make parsing decisions based on lookahead tokens. This technique is useful for parsing languages with context-sensitive grammar rules.
// Parse different types based on lookahead
if (lexer.next('AnyIdentifier')) {
const identifier = lexer.expect('AnyIdentifier');
if (identifier.name === 'enum') {
// Parse enum declaration
} else if (identifier.name === 'model') {
// Parse model declaration
}
}
7.3. Custom Reader Functions
Reader functions should follow this pattern:
import type { Reader } from '@stackpress/idea-parser';
const customReader: Reader = (code, start, lexer) => {
// Check if pattern matches at current position
if (!code.startsWith('custom', start)) {
return undefined; // No match
}
// Calculate end position
const end = start + 6;
// Return token object
return {
type: 'CustomToken',
value: 'custom',
start: start,
end: end
};
};
// Register the custom reader
lexer.define('custom', customReader);