Structures and functions for separating a character buffer into lexemes.
More...
Detailed Description
Structures and functions for separating a character buffer into lexemes.
The lexer reads through a buffer of characters (themselves typically read from standard input), strips whitespace, and breaks them up into logical atoms of character strings which, in turn, may be passed on to later processes (such as a tokenizer).
- Author:
- Justin J. Meza
- Date:
- 2010
Function Documentation
Adds a Lexeme structure to a LexemeList structure.
- Precondition:
- list was created by createLexemeList(void).
-
lexeme was created by createLexeme(char *, const char *, unsigned int).
- Postcondition:
- lexeme will be added on to the end of list and the size of list will be updated accordingly.
- Returns:
- A pointer to the added Lexeme structure (will be the same as lexeme).
- Return values:
-
| NULL | realloc was unable to allocate memory. |
- Parameters:
-
| [in,out] | list | A pointer to the LexemeList structure to add lex to. |
| [in] | lexeme | A pointer to the Lexeme structure to add to list. |
| Lexeme* createLexeme |
( |
char * |
image, |
|
|
const char * |
fname, |
|
|
unsigned int |
line | |
|
) |
| | |
Creates a Lexeme structure.
- Returns:
- A pointer to a Lexeme structure with the desired properties.
- Return values:
-
| NULL | malloc was unable to allocate memory. |
- See also:
- deleteLexeme(Lexeme *)
- Note:
- fname is not copied because it would only one copy is stored for all Lexeme structures that share it.
- Parameters:
-
| [in] | image | An array of characters that describe the lexeme. |
| [in] | fname | A pointer to the name of the file containing the lexeme. |
| [in] | line | The line number from the source file that the lexeme occurred on. |
| void deleteLexeme |
( |
Lexeme * |
lexeme |
) |
|
| LexemeList* scanBuffer |
( |
const char * |
buffer, |
|
|
unsigned int |
size, |
|
|
const char * |
fname | |
|
) |
| | |
Scans through a character buffer, removing unecessary characters and generating lexemes.
Lexemes are separated by whitespace (but newline characters are kept as their own lexeme). String literals are handled a bit differently: starting at the first quotation character, characters are collected until either an unescaped quotation character is read (that is, a quotation character not preceeded by a colon which itself is not proceeded by a colon) or a newline or carriage return character is read, whichever comes first. This handles the odd case of strings such as "::" which print out a single colon. Also handled are the effects of commas, ellipses, and bangs (!).
- Precondition:
- size is the number of characters starting at the memory location pointed to by buffer.
- Returns:
- A pointer to a LexemeList structure.
- Parameters:
-
| [in] | buffer | An array of characters to tokenize. |
| [in] | size | The number of characters in buffer. |
| [in] | fname | An array of characters representing the name of the file used to read buffer. |