duckdb-spreadsheets / text_functions.tsv
lhoestq's picture
lhoestq HF staff
v0
9a96811
Name Description
string ^@ search_string Return true if string begins with search_string.
string || string Concatenate two strings. Any NULL input results in NULL. See also concat(string, ...).
string[index] Extract a single character using a (1-based) index.
string[begin:end] Extract a string using slice conventions, see slicing.
string LIKE target Returns true if the string matches the like specifier (see Pattern Matching).
string SIMILAR TO regex Returns true if the string matches the regex; identical to regexp_full_match (see Pattern Matching).
array_extract(list, index) Extract a single character using a (1-based) index.
array_slice(list, begin, end) Extract a string using slice conventions. Negative values are accepted.
ascii(string) Returns an integer that represents the Unicode code point of the first character of the string.
bar(x, min, max[, width]) Draw a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80.
bit_length(string) Number of bits in a string.
chr(x) Returns a character which is corresponding the ASCII code value or Unicode code point.
concat_ws(separator, string, ...) Concatenate many strings, separated by separator. NULL inputs are skipped.
concat(string, ...) Concatenate many strings. NULL inputs are skipped. See also string || string.
contains(string, search_string) Return true if search_string is found within string.
ends_with(string, search_string) Return true if string ends with search_string.
format_bytes(bytes) Converts bytes to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
format(format, parameters, ...) Formats a string using the fmt syntax.
from_base64(string) Convert a base64 encoded string to a character string.
greatest(x1, x2, ...) Selects the largest value using lexicographical ordering. Note that lowercase characters are considered “larger” than uppercase characters and collations are not supported.
hash(value) Returns a UBIGINT with the hash of the value.
ilike_escape(string, like_specifier, escape_character) Returns true if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.
instr(string, search_string) Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found.
least(x1, x2, ...) Selects the smallest value using lexicographical ordering. Note that uppercase characters are considered “smaller” than lowercase characters, and collations are not supported.
left_grapheme(string, count) Extract the left-most grapheme clusters.
left(string, count) Extract the left-most count characters.
length_grapheme(string) Number of grapheme clusters in string.
length(string) Number of characters in string.
like_escape(string, like_specifier, escape_character) Returns true if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.
lower(string) Convert string to lower case.
lpad(string, count, character) Pads the string with the character from the left until it has count characters.
ltrim(string, characters) Removes any occurrences of any of the characters from the left side of the string.
ltrim(string) Removes any spaces from the left side of the string.
md5(string) Returns the MD5 hash of the string as a VARCHAR.
md5_number(string) Returns the MD5 hash of the string as a HUGEINT.
md5_number_lower(string) Returns the lower 64-bit segment of the MD5 hash of the string as a BIGINT.
md5_number_higher(string) Returns the higher 64-bit segment of the MD5 hash of the string as a BIGINT.
nfc_normalize(string) Convert string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not.
not_ilike_escape(string, like_specifier, escape_character) Returns false if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.
not_like_escape(string, like_specifier, escape_character) Returns false if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.
ord(string) Return ASCII character code of the leftmost character in a string.
parse_dirname(path, separator) Returns the top-level directory name from the given path. separator options: system, both_slash (default), forward_slash, backslash.
parse_dirpath(path, separator) Returns the head of the path (the pathname until the last slash) similarly to Python's os.path.dirname function. separator options: system, both_slash (default), forward_slash, backslash.
parse_filename(path, trim_extension, separator) Returns the last component of the path similarly to Python's os.path.basename function. If trim_extension is true, the file extension will be removed (defaults to false). separator options: system, both_slash (default), forward_slash, backslash.
parse_path(path, separator) Returns a list of the components (directories and filename) in the path similarly to Python's pathlib.parts function. separator options: system, both_slash (default), forward_slash, backslash.
position(search_string IN string) Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found.
printf(format, parameters...) Formats a string using printf syntax.
read_text(source) Returns the content from source (a filename, a list of filenames, or a glob pattern) as a VARCHAR. The file content is first validated to be valid UTF-8. If read_text attempts to read a file with invalid UTF-8 an error is thrown suggesting to use read_blob instead. See the read_text guide for more details.
regexp_escape(string) Escapes special patterns to turn string into a regular expression similarly to Python's re.escape function.
regexp_extract(string, pattern[, group = 0]) If string contains the regexp pattern, returns the capturing group specified by optional parameter group (see Pattern Matching).
regexp_extract(string, pattern, name_list) If string contains the regexp pattern, returns the capturing groups as a struct with corresponding names from name_list (see Pattern Matching).
regexp_extract_all(string, regex[, group = 0]) Split the string along the regex and extract all occurrences of group.
regexp_full_match(string, regex) Returns true if the entire string matches the regex (see Pattern Matching).
regexp_matches(string, pattern) Returns true if string contains the regexp pattern, false otherwise (see Pattern Matching).
regexp_replace(string, pattern, replacement) If string contains the regexp pattern, replaces the matching part with replacement (see Pattern Matching).
regexp_split_to_array(string, regex) Splits the string along the regex.
regexp_split_to_table(string, regex) Splits the string along the regex and returns a row for each part.
repeat(string, count) Repeats the string count number of times.
replace(string, source, target) Replaces any occurrences of the source with target in string.
reverse(string) Reverses the string.
right_grapheme(string, count) Extract the right-most count grapheme clusters.
right(string, count) Extract the right-most count characters.
rpad(string, count, character) Pads the string with the character from the right until it has count characters.
rtrim(string, characters) Removes any occurrences of any of the characters from the right side of the string.
rtrim(string) Removes any spaces from the right side of the string.
sha256(value) Returns a VARCHAR with the SHA-256 hash of the value.
split_part(string, separator, index) Split the string along the separator and return the data at the (1-based) index of the list. If the index is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior).
starts_with(string, search_string) Return true if string begins with search_string.
str_split_regex(string, regex) Splits the string along the regex.
string_split_regex(string, regex) Splits the string along the regex.
string_split(string, separator) Splits the string along the separator.
strip_accents(string) Strips accents from string.
strlen(string) Number of bytes in string.
strpos(string, search_string) Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found.
substring(string, start, length) Extract substring of length characters starting from character start. Note that a start value of 1 refers to the first character of the string.
substring_grapheme(string, start, length) Extract substring of length grapheme clusters starting from character start. Note that a start value of 1 refers to the first character of the string.
to_base64(blob) Convert a blob to a base64 encoded string.
trim(string, characters) Removes any occurrences of any of the characters from either side of the string.
trim(string) Removes any spaces from either side of the string.
unicode(string) Returns the Unicode code of the first character of the string.
upper(string) Convert string to upper case.