Lexer: Resources

What is a lexical token?

> A lexical token is a sequence of characters that can be treated as a unit in the grammar of a programming language. A programming language classifies lexical tokens into a finite set of token types.

Known approaches to implementing a scanner:

1. Using regular expressions that define tokens
2. Through explicit control

The former is rather straightforward, and is the basis of popular scanner generators (lex, flex, etc). The latter, would mean explicitly running each possible transition of a DFA.

An example in pseudocode for an explicit control scanner:

```pseudocode
CurrentChar <- Read()
if CurrentChar = '/':
then
    CurrentChar <- Read()
    if CurrentChar = '/'
        repeat
            CurrentChar <- Read()
        until CurrentChar != {EOF, EOL}
    else Error()
else Error()
if CurrentChar = EOL
then Accept(CurrentChar)
else Error()
```

For each of those, it is a useful practice to draw the NFA/DFA models (and leverage useful algorithms e.g `addDFA` and `epislon-closure`[1]) as a way help organize and structure the code.

> While the code in Figures 2.5 and 2.6 serves to illustrate the nature of a scanner, we emphasize that the most reliable and expedient methods for constructing scanners do so automatically from regular expressions, as covered in Chapter 3. Such scanners are reasonably efficient and correct by construction, given a correct set of regular-expression specifications for the tokens.
>
> Crafting Compiler

In any case, we want our lexer to extract tokens using a [maximal munch](https://en.wikipedia.org/wiki/Maximal_munch) heuristic.


## Resource sink

### Existing implementations

- [meriyah](https://github.com/meriyah/meriyah/blob/a7be1edeaec2e37640f01b15d69b9b859abdf4e6/src/lexer/scan.ts#L241)

### Theoretical reference

- http://www.semware.com/html/01-lex.html
- Appel, Andrew — Modern Compiler Implementation (2004)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexer: Resources #5

Resource sink

Existing implementations

Theoretical reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Lexer: Resources #5

Description

Resource sink

Existing implementations

Theoretical reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions