libcudf  24.02.00
Files | Enumerations | Functions
Character Types

Files

file  char_types.hpp
 
file  char_types_enum.hpp
 

Enumerations

enum  cudf::strings::string_character_types : uint32_t {
  cudf::strings::DECIMAL = 1 << 0 , cudf::strings::NUMERIC = 1 << 1 , cudf::strings::DIGIT = 1 << 2 , cudf::strings::ALPHA = 1 << 3 ,
  cudf::strings::SPACE = 1 << 4 , cudf::strings::UPPER = 1 << 5 , cudf::strings::LOWER = 1 << 6 , cudf::strings::ALPHANUM = DECIMAL | NUMERIC | DIGIT | ALPHA ,
  cudf::strings::CASE_TYPES = UPPER | LOWER , cudf::strings::ALL_TYPES = ALPHANUM | CASE_TYPES | SPACE
}
 Character type values. These types can be or'd to check for any combination of types. More...
 

Functions

std::unique_ptr< columncudf::strings::all_characters_of_type (strings_column_view const &input, string_character_types types, string_character_types verify_types=string_character_types::ALL_TYPES, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Returns a boolean column identifying strings entries in which all characters are of the type specified. More...
 
std::unique_ptr< columncudf::strings::filter_characters_of_type (strings_column_view const &input, string_character_types types_to_remove, string_scalar const &replacement=string_scalar(""), string_character_types types_to_keep=string_character_types::ALL_TYPES, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Filter specific character types from a column of strings. More...
 
constexpr string_character_types cudf::strings::operator| (string_character_types lhs, string_character_types rhs)
 OR operator for combining string_character_types. More...
 
constexpr string_character_typescudf::strings::operator|= (string_character_types &lhs, string_character_types rhs)
 Compound assignment OR operator for combining string_character_types. More...
 

Detailed Description

Enumeration Type Documentation

◆ string_character_types

Character type values. These types can be or'd to check for any combination of types.

This cannot be turned into an enum class because or'd entries can result in values that are not in the class. For example, combining NUMERIC|SPACE is a valid, reasonable combination but does not match to any explicitly named enumerator.

Enumerator
DECIMAL 

all decimal characters

NUMERIC 

all numeric characters

DIGIT 

all digit characters

ALPHA 

all alphabetic characters

SPACE 

all space characters

UPPER 

all upper case characters

LOWER 

all lower case characters

ALPHANUM 

all alphanumeric characters

CASE_TYPES 

all case-able characters

ALL_TYPES 

all character types

Definition at line 38 of file char_types_enum.hpp.

Function Documentation

◆ all_characters_of_type()

std::unique_ptr<column> cudf::strings::all_characters_of_type ( strings_column_view const &  input,
string_character_types  types,
string_character_types  verify_types = string_character_types::ALL_TYPES,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Returns a boolean column identifying strings entries in which all characters are of the type specified.

The output row entry will be set to false if the corresponding string element is empty or has at least one character not of the specified type. If all characters fit the type then true is set in that output row entry.

To ignore all but specific types, set the verify_types to those types which should be checked. Otherwise, the default ALL_TYPES will verify all characters match types.

Example:
s = ['ab', 'a b', 'a7', 'a B']
b1 = s.all_characters_of_type(s,LOWER)
b1 is [true, false, false, false]
b2 = s.all_characters_of_type(s,LOWER,LOWER|UPPER)
b2 is [true, true, true, false]

Any null row results in a null entry for that row in the output column.

Parameters
inputStrings instance for this operation
typesThe character types to check in each string
verify_typesOnly verify against these character types. Default ALL_TYPES means return true iff all characters match types.
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
New column of boolean results for each string

◆ filter_characters_of_type()

std::unique_ptr<column> cudf::strings::filter_characters_of_type ( strings_column_view const &  input,
string_character_types  types_to_remove,
string_scalar const &  replacement = string_scalar(""),
string_character_types  types_to_keep = string_character_types::ALL_TYPES,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource mr = rmm::mr::get_current_device_resource() 
)

Filter specific character types from a column of strings.

To remove all characters of a specific type, set that type in types_to_remove and set types_to_keep to ALL_TYPES.

To filter out characters NOT of a select type, specify ALL_TYPES for types_to_remove and which types to not remove in types_to_keep.

Example:
s = ['ab', 'a b', 'a7bb', 'A7B234']
s1 = s.filter_characters_of_type(s,NUMERIC,"",ALL_TYPES)
s1 is ['ab', 'a b', 'abb', 'AB']
s2 = s.filter_characters_of_type(s,ALL_TYPES,"-",LOWER)
s2 is ['ab', 'a-b', 'a-bb', '------']

In s1 all NUMERIC types have been removed. In s2 all non-LOWER types have been replaced.

One but not both parameters types_to_remove and types_to_keep must be set to ALL_TYPES.

Any null row results in a null entry for that row in the output column.

Exceptions
cudf::logic_errorif neither or both types_to_remove and types_to_keep are set to ALL_TYPES.
Parameters
inputStrings instance for this operation
types_to_removeThe character types to check in each string. Use ALL_TYPES here to specify types_to_keep instead.
replacementThe replacement character to use when removing characters
types_to_keepDefault ALL_TYPES means all characters of types_to_remove will be filtered.
mrDevice memory resource used to allocate the returned column's device memory
streamCUDA stream used for device memory operations and kernel launches
Returns
New column of boolean results for each string

◆ operator|()

constexpr string_character_types cudf::strings::operator| ( string_character_types  lhs,
string_character_types  rhs 
)
constexpr

OR operator for combining string_character_types.

Parameters
lhsleft-hand side of OR operation
rhsright-hand side of OR operation
Returns
combined string_character_types

Definition at line 58 of file char_types_enum.hpp.

◆ operator|=()

constexpr string_character_types& cudf::strings::operator|= ( string_character_types lhs,
string_character_types  rhs 
)
constexpr

Compound assignment OR operator for combining string_character_types.

Parameters
lhsleft-hand side of OR operation
rhsright-hand side of OR operation
Returns
Reference to lhs after combining lhs and rhs

Definition at line 72 of file char_types_enum.hpp.