Class Lexer

java.lang.Object
org.apache.commons.csv.Lexer
All Implemented Interfaces:
Closeable, AutoCloseable

final class Lexer extends Object implements Closeable
Lexical analyzer.
  • Field Details

    • CR_STRING

      private static final String CR_STRING
    • LF_STRING

      private static final String LF_STRING
    • DISABLED

      private static final char DISABLED
      Constant char to use for disabling comments, escapes and encapsulation. The value -2 is used because it won't be confused with an EOF signal (-1), and because the Unicode value FFFE would be encoded as two chars (using surrogates) and thus there should never be a collision with a real text char.
      See Also:
    • delimiter

      private final char[] delimiter
    • delimiterBuf

      private final char[] delimiterBuf
    • escapeDelimiterBuf

      private final char[] escapeDelimiterBuf
    • escape

      private final char escape
    • quoteChar

      private final char quoteChar
    • commentStart

      private final char commentStart
    • ignoreSurroundingSpaces

      private final boolean ignoreSurroundingSpaces
    • ignoreEmptyLines

      private final boolean ignoreEmptyLines
    • reader

      private final ExtendedBufferedReader reader
      The input stream
    • firstEol

      private String firstEol
    • isLastTokenDelimiter

      private boolean isLastTokenDelimiter
  • Constructor Details

  • Method Details

    • close

      public void close() throws IOException
      Closes resources.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - If an I/O error occurs
    • getCharacterPosition

      long getCharacterPosition()
      Returns the current character position
      Returns:
      the current character position
    • getCurrentLineNumber

      long getCurrentLineNumber()
      Returns the current line number
      Returns:
      the current line number
    • getFirstEol

      String getFirstEol()
    • isClosed

      boolean isClosed()
    • isCommentStart

      boolean isCommentStart(int ch)
    • isDelimiter

      boolean isDelimiter(int ch) throws IOException
      Determine whether the next characters constitute a delimiter through ExtendedBufferedReader.lookAhead(char[]).
      Parameters:
      ch - the current character.
      Returns:
      true if the next characters constitute a delimiter.
      Throws:
      IOException - If an I/O error occurs.
    • isEndOfFile

      boolean isEndOfFile(int ch)
      Tests if the given character indicates end of file.
      Returns:
      true if the given character indicates end of file.
    • isEscape

      boolean isEscape(int ch)
      Tests if the given character is the escape character.
      Returns:
      true if the given character is the escape character.
    • isEscapeDelimiter

      boolean isEscapeDelimiter() throws IOException
      Tests if the next characters constitute a escape delimiter through ExtendedBufferedReader.lookAhead(char[]). For example, for delimiter "[|]" and escape '!', return true if the next characters constitute "![!|!]".
      Returns:
      true if the next characters constitute a escape delimiter.
      Throws:
      IOException - If an I/O error occurs.
    • isMetaChar

      private boolean isMetaChar(int ch)
    • isQuoteChar

      boolean isQuoteChar(int ch)
    • isStartOfLine

      boolean isStartOfLine(int ch)
      Tests if the current character represents the start of a line: a CR, LF or is at the start of the file.
      Parameters:
      ch - the character to check
      Returns:
      true if the character is at the start of a line.
    • mapNullToDisabled

      private char mapNullToDisabled(Character c)
    • nextToken

      Token nextToken(Token token) throws IOException
      Returns the next token.

      A token corresponds to a term, a record change or an end-of-file indicator.

      Parameters:
      token - an existing Token object to reuse. The caller is responsible to initialize the Token.
      Returns:
      the next token found.
      Throws:
      IOException - on stream access error.
    • parseEncapsulatedToken

      private Token parseEncapsulatedToken(Token token) throws IOException
      Parses an encapsulated token.

      Encapsulated tokens are surrounded by the given encapsulating-string. The encapsulator itself might be included in the token using a doubling syntax (as "", '') or using escaping (as in \", \'). Whitespaces before and after an encapsulated token are ignored. The token is finished when one of the following conditions become true:

      • an unescaped encapsulator has been reached, and is followed by optional whitespace then:
        • delimiter (TOKEN)
        • end of line (EORECORD)
      • end of stream has been reached (EOF)
      Parameters:
      token - the current token
      Returns:
      a valid token object
      Throws:
      IOException - on invalid state: EOF before closing encapsulator or invalid character before delimiter or EOL
    • parseSimpleToken

      private Token parseSimpleToken(Token token, int ch) throws IOException
      Parses a simple token.

      Simple token are tokens which are not surrounded by encapsulators. A simple token might contain escaped delimiters (as \, or \;). The token is finished when one of the following conditions become true:

      • end of line has been reached (EORECORD)
      • end of stream has been reached (EOF)
      • an unescaped delimiter has been reached (TOKEN)
      Parameters:
      token - the current token
      ch - the current character
      Returns:
      the filled token
      Throws:
      IOException - on stream access error
    • readEndOfLine

      boolean readEndOfLine(int ch) throws IOException
      Greedily accepts \n, \r and \r\n This checker consumes silently the second control-character...
      Returns:
      true if the given or next character is a line-terminator
      Throws:
      IOException
    • readEscape

      int readEscape() throws IOException
      Handle an escape sequence. The current character must be the escape character. On return, the next character is available by calling ExtendedBufferedReader.getLastChar() on the input stream.
      Returns:
      the unescaped character (as an int) or Constants.END_OF_STREAM if char following the escape is invalid.
      Throws:
      IOException - if there is a problem reading the stream or the end of stream is detected: the escape character is not allowed at end of stream
    • trimTrailingSpaces

      void trimTrailingSpaces(StringBuilder buffer)