Structures and functions for grouping lexemes into tokens. More...
Data Structures | |
| struct | Token |
| Stores a token and any value parsed by the tokenizer. More... | |
| union | TokenData |
| Stores the data associated with a Token structure. More... | |
Enumerations | |
| enum | TokenType { TT_INTEGER, TT_FLOAT, TT_STRING, TT_IDENTIFIER, TT_BOOLEAN, TT_IT, TT_NOOB, TT_NUMBR, TT_NUMBAR, TT_TROOF, TT_YARN, TT_EOF, TT_NEWLINE, TT_HAI, TT_KTHXBYE, TT_HASA, TT_ITZ, TT_R, TT_ANYR, TT_AN, TT_SUMOF, TT_DIFFOF, TT_PRODUKTOF, TT_QUOSHUNTOF, TT_MODOF, TT_BIGGROF, TT_SMALLROF, TT_BOTHOF, TT_EITHEROF, TT_WONOF, TT_NOT, TT_MKAY, TT_ALLOF, TT_ANYOF, TT_BOTHSAEM, TT_DIFFRINT, TT_MAEK, TT_A, TT_ISNOWA, TT_VISIBLE, TT_SMOOSH, TT_BANG, TT_GIMMEH, TT_ORLY, TT_YARLY, TT_MEBBE, TT_NOWAI, TT_OIC, TT_WTF, TT_OMG, TT_OMGWTF, TT_GTFO, TT_IMINYR, TT_UPPIN, TT_NERFIN, TT_YR, TT_TIL, TT_WILE, TT_IMOUTTAYR, TT_HOWDUZ, TT_IFUSAYSO, TT_FOUNDYR, TT_ENDOFTOKENS } |
Denotes the type of token present. More... | |
Functions | |
| unsigned int | acceptLexemes (LexemeList *, unsigned int, const char *) |
| Tries to match a sequence of lexemes. | |
| Token * | addToken (Token ***, unsigned int *, Token *) |
| Adds a Token to an array of Token structures. | |
| Token * | createToken (TokenType, const char *, const char *, unsigned int) |
| Creates a Token structure. | |
| void | deleteToken (Token *) |
| Deletes a Token structure. | |
| void | deleteTokens (Token **) |
| Deletes an array of Token structures. | |
| int | isFloat (const char *) |
| Checks if a string of characters follows the format for a floating point decimal. | |
| int | isIdentifier (const char *) |
| Checks if a string of characters follows the format for an identifier. | |
| int | isInteger (const char *) |
| Checks if a string of characters follows the format for an integer. | |
| Token * | isKeyword (LexemeList *, unsigned int *) |
| Checks if a sequence of lexemes is a keyword. | |
| int | isString (const char *) |
| Checks if a string of characters follows the format for a string. | |
| Token ** | tokenizeLexemes (LexemeList *) |
| Converts a list of lexemes into tokens. | |
Structures and functions for grouping lexemes into tokens.
The tokenizer reads through an array of lexemes (generated by the lexer) and groups them into tokens based on their structure. In addition, some lexemes with semantic meaning (such as integers, floats, strings, and booleans) will have their values extracted and stored.
| enum TokenType |
Denotes the type of token present.
All of the token type names are self-explainatory and correspond to either the semantic type of token data (in the case of TT_INTEGER, TT_FLOAT, TT_STRING, or TT_IDENTIFIER) or the lexemes which make up the particular token.
| unsigned int acceptLexemes | ( | LexemeList * | lexemes, | |
| unsigned int | start, | |||
| const char * | match | |||
| ) |
Tries to match a sequence of lexemes.
Scans through lexemes starting at start and tries to match space-delimited lexemes from match.
| [in] | lexemes | A pointer to a LexemeList structure to match lexemes from. |
| [in] | start | The position within lexemes to start matching at. |
| [in] | match | A pointer to a character array describing the sequence of lexemes to match. |
Adds a Token to an array of Token structures.
| NULL | realloc was unable to allocate memory. |
Creates a Token structure.
| NULL | malloc was unable to allocate memory. |
| [in] | type | The type of token to create. |
| [in] | image | The characters from the source file that represent the token. |
| [in] | fname | A pointer to the name of the file containing the token. |
| [in] | line | The line number from the source file that the token occurred on. |
| void deleteToken | ( | Token * | token | ) |
Deletes a Token structure.
| void deleteTokens | ( | Token ** | list | ) |
Deletes an array of Token structures.
| [in,out] | list | A pointer to an array of Token structures to be deleted. |
| int isFloat | ( | const char * | image | ) |
Checks if a string of characters follows the format for a floating point decimal.
Specifically, it checks if the string of characters matches the regular expression: [-]?[0-9].[0-9]*
| 0 | The string of characters is not a floating point decimal. | |
| 1 | The string of characters is a floating point decimal. |
| [in] | image | The string of characters to compare. |
| int isIdentifier | ( | const char * | image | ) |
Checks if a string of characters follows the format for an identifier.
Specifically, it checks if the string of characters matches the regular expression: [a-zA-Z][a-zA-Z0-9_]*
| 0 | The string of characters is not an identifier. | |
| 1 | The string of characters is an identifier. |
| [in] | image | The string of characters to compare. |
| int isInteger | ( | const char * | image | ) |
Checks if a string of characters follows the format for an integer.
Specifically, it checks if the string of characters matches the regular expression: [-]?[1-9][0-9]* | 0
| 0 | The string of characters is not an integer. | |
| 1 | The string of characters is an integer. |
| [in] | image | The string of characters to compare. |
| Token* isKeyword | ( | LexemeList * | lexemes, | |
| unsigned int * | start | |||
| ) |
Checks if a sequence of lexemes is a keyword.
lexemes is searched starting at start for keywords. If one is found, the appropriate Token structure is created and returned and the value of start is incremented by the number of lexemes matched minus one.
| NULL | No keywords were matched or there was an error allocating memory. |
| [in] | lexemes | A pointer to a LexemeList structure to search for keywords in. |
| [in,out] | start | A pointer to the position within lexemes to start checking at. |
| int isString | ( | const char * | image | ) |
Checks if a string of characters follows the format for a string.
Specifically, it checks if the string of characters begins and ends with a quote character.
| 0 | The string of characters is not a string. | |
| 1 | The string of characters is a string. |
| [in] | image | The string of characters to compare. |
| Token** tokenizeLexemes | ( | LexemeList * | list | ) |
Converts a list of lexemes into tokens.
Additionally parses the literal values of integers, floating point decimals, and strings.
| NULL | An unrecognized token was encountered or memory allocation failed. |
| [in] | list | A pointer to a LexemeList structure to tokenize. |
1.7.1