Data Structures | Enumerations | Functions

tokenizer.h File Reference

Structures and functions for grouping lexemes into tokens. More...

Data Structures

struct  Token
 Stores a token and any value parsed by the tokenizer. More...
union  TokenData
 Stores the data associated with a Token structure. More...

Enumerations

enum  TokenType {
  TT_INTEGER, TT_FLOAT, TT_STRING, TT_IDENTIFIER,
  TT_BOOLEAN, TT_IT, TT_NOOB, TT_NUMBR,
  TT_NUMBAR, TT_TROOF, TT_YARN, TT_EOF,
  TT_NEWLINE, TT_HAI, TT_KTHXBYE, TT_HASA,
  TT_ITZ, TT_R, TT_ANYR, TT_AN,
  TT_SUMOF, TT_DIFFOF, TT_PRODUKTOF, TT_QUOSHUNTOF,
  TT_MODOF, TT_BIGGROF, TT_SMALLROF, TT_BOTHOF,
  TT_EITHEROF, TT_WONOF, TT_NOT, TT_MKAY,
  TT_ALLOF, TT_ANYOF, TT_BOTHSAEM, TT_DIFFRINT,
  TT_MAEK, TT_A, TT_ISNOWA, TT_VISIBLE,
  TT_SMOOSH, TT_BANG, TT_GIMMEH, TT_ORLY,
  TT_YARLY, TT_MEBBE, TT_NOWAI, TT_OIC,
  TT_WTF, TT_OMG, TT_OMGWTF, TT_GTFO,
  TT_IMINYR, TT_UPPIN, TT_NERFIN, TT_YR,
  TT_TIL, TT_WILE, TT_IMOUTTAYR, TT_HOWDUZ,
  TT_IFUSAYSO, TT_FOUNDYR, TT_ENDOFTOKENS
}
 

Denotes the type of token present.

More...

Functions

unsigned int acceptLexemes (LexemeList *, unsigned int, const char *)
 Tries to match a sequence of lexemes.
TokenaddToken (Token ***, unsigned int *, Token *)
 Adds a Token to an array of Token structures.
TokencreateToken (TokenType, const char *, const char *, unsigned int)
 Creates a Token structure.
void deleteToken (Token *)
 Deletes a Token structure.
void deleteTokens (Token **)
 Deletes an array of Token structures.
int isFloat (const char *)
 Checks if a string of characters follows the format for a floating point decimal.
int isIdentifier (const char *)
 Checks if a string of characters follows the format for an identifier.
int isInteger (const char *)
 Checks if a string of characters follows the format for an integer.
TokenisKeyword (LexemeList *, unsigned int *)
 Checks if a sequence of lexemes is a keyword.
int isString (const char *)
 Checks if a string of characters follows the format for a string.
Token ** tokenizeLexemes (LexemeList *)
 Converts a list of lexemes into tokens.

Detailed Description

Structures and functions for grouping lexemes into tokens.

The tokenizer reads through an array of lexemes (generated by the lexer) and groups them into tokens based on their structure. In addition, some lexemes with semantic meaning (such as integers, floats, strings, and booleans) will have their values extracted and stored.

Author:
Justin J. Meza
Date:
2010

Enumeration Type Documentation

enum TokenType

Denotes the type of token present.

All of the token type names are self-explainatory and correspond to either the semantic type of token data (in the case of TT_INTEGER, TT_FLOAT, TT_STRING, or TT_IDENTIFIER) or the lexemes which make up the particular token.


Function Documentation

unsigned int acceptLexemes ( LexemeList lexemes,
unsigned int  start,
const char *  match 
)

Tries to match a sequence of lexemes.

Scans through lexemes starting at start and tries to match space-delimited lexemes from match.

Precondition:
lexemes was created by scanBuffer(const char *, unsigned int, const char *).
Returns:
The number of lexemes matched.
Parameters:
[in] lexemes A pointer to a LexemeList structure to match lexemes from.
[in] start The position within lexemes to start matching at.
[in] match A pointer to a character array describing the sequence of lexemes to match.
Token* addToken ( Token ***  list,
unsigned int *  num,
Token token 
)

Adds a Token to an array of Token structures.

Note:
list may be NULL in which case a new list is created.
Precondition:
num is the number of elements in list.
Postcondition:
token will be added on to the end of list and the value at num will be updated accordingly.
Returns:
A pointer to the added Token structure (will be the same as token).
Return values:
NULL realloc was unable to allocate memory.
See also:
deleteTokens(Token **)
Parameters:
[in,out] list A pointer to a pointer to an array of Token structures to add the new Token onto.
[in,out] num A pointer to the number of elements in list.
[in] token A pointer to the Token structure to add to list.
Token* createToken ( TokenType  type,
const char *  image,
const char *  fname,
unsigned int  line 
)

Creates a Token structure.

Returns:
A pointer to a Token structure with the desired properties.
Return values:
NULL malloc was unable to allocate memory.
See also:
deleteToken(Token *)

Note:
fname is not copied because it would only one copy is stored for all Token structures that share it.

Parameters:
[in] type The type of token to create.
[in] image The characters from the source file that represent the token.
[in] fname A pointer to the name of the file containing the token.
[in] line The line number from the source file that the token occurred on.
void deleteToken ( Token token  ) 

Deletes a Token structure.

Precondition:
token points to a Token structure created by createToken(TokenType, const char *, const char *, unsigned int).
Postcondition:
The memory at token and all of its elements will be freed.
See also:
createToken(TokenType, const char *, const char *, unsigned int)
void deleteTokens ( Token **  list  ) 

Deletes an array of Token structures.

Precondition:
list was created by and contains items added by addToken(Token ***, unsigned int *, TokenType, const char *, unsigned int).
Postcondition:
The memory at list and all of its elements will be freed.
See also:
addToken(Token ***, unsigned int *, TokenType, const char *, unsigned int)
Parameters:
[in,out] list A pointer to an array of Token structures to be deleted.
int isFloat ( const char *  image  ) 

Checks if a string of characters follows the format for a floating point decimal.

Specifically, it checks if the string of characters matches the regular expression: [-]?[0-9].[0-9]*

Return values:
0 The string of characters is not a floating point decimal.
1 The string of characters is a floating point decimal.
See also:
isInteger(const char *)
isString(const char *)
isIdentifier(const char *)
Parameters:
[in] image The string of characters to compare.
int isIdentifier ( const char *  image  ) 

Checks if a string of characters follows the format for an identifier.

Specifically, it checks if the string of characters matches the regular expression: [a-zA-Z][a-zA-Z0-9_]*

Return values:
0 The string of characters is not an identifier.
1 The string of characters is an identifier.
See also:
isInteger(const char *)
isFloat(const char *)
isString(const char *)
Parameters:
[in] image The string of characters to compare.
int isInteger ( const char *  image  ) 

Checks if a string of characters follows the format for an integer.

Specifically, it checks if the string of characters matches the regular expression: [-]?[1-9][0-9]* | 0

Return values:
0 The string of characters is not an integer.
1 The string of characters is an integer.
See also:
isFloat(const char *)
isString(const char *)
isIdentifier(const char *)
Parameters:
[in] image The string of characters to compare.
Token* isKeyword ( LexemeList lexemes,
unsigned int *  start 
)

Checks if a sequence of lexemes is a keyword.

lexemes is searched starting at start for keywords. If one is found, the appropriate Token structure is created and returned and the value of start is incremented by the number of lexemes matched minus one.

Precondition:
lexemes was created by scanBuffer(const char *, unsigned int, const char *).
Postcondition:
If a keyword is not found, start will be unmodified. Otherwise, start will be incremented by the number of lexemes matched minus one.
Returns:
A pointer to a newly created keyword Token structure.
Return values:
NULL No keywords were matched or there was an error allocating memory.
Parameters:
[in] lexemes A pointer to a LexemeList structure to search for keywords in.
[in,out] start A pointer to the position within lexemes to start checking at.
int isString ( const char *  image  ) 

Checks if a string of characters follows the format for a string.

Specifically, it checks if the string of characters begins and ends with a quote character.

Return values:
0 The string of characters is not a string.
1 The string of characters is a string.
See also:
isInteger(const char *)
isFloat(const char *)
isIdentifier(const char *)
Parameters:
[in] image The string of characters to compare.
Token** tokenizeLexemes ( LexemeList list  ) 

Converts a list of lexemes into tokens.

Additionally parses the literal values of integers, floating point decimals, and strings.

Precondition:
list was created by scanBuffer(const char *, unsigned int, const char *).
Returns:
A pointer to an array of Token structures representing the tokenized form of the input lexeme stream.
Return values:
NULL An unrecognized token was encountered or memory allocation failed.
Parameters:
[in] list A pointer to a LexemeList structure to tokenize.
 All Data Structures Files Functions Variables Enumerations Enumerator Defines