libcudf
23.12.00
|
Class to build parquet_writer_options
.
More...
#include <parquet.hpp>
Public Member Functions | |
parquet_writer_options_builder ()=default | |
Default constructor. More... | |
parquet_writer_options_builder (sink_info const &sink, table_view const &table) | |
Constructor from sink and table. More... | |
parquet_writer_options_builder & | partitions (std::vector< partition_info > partitions) |
Sets partitions in parquet_writer_options. More... | |
parquet_writer_options_builder & | metadata (table_input_metadata metadata) |
Sets metadata in parquet_writer_options. More... | |
parquet_writer_options_builder & | key_value_metadata (std::vector< std::map< std::string, std::string >> metadata) |
Sets Key-Value footer metadata in parquet_writer_options. More... | |
parquet_writer_options_builder & | stats_level (statistics_freq sf) |
Sets the level of statistics in parquet_writer_options. More... | |
parquet_writer_options_builder & | compression (compression_type compression) |
Sets compression type in parquet_writer_options. More... | |
parquet_writer_options_builder & | column_chunks_file_paths (std::vector< std::string > file_paths) |
Sets column chunks file path to be set in the raw output metadata. More... | |
parquet_writer_options_builder & | row_group_size_bytes (size_t val) |
Sets the maximum row group size, in bytes. More... | |
parquet_writer_options_builder & | row_group_size_rows (size_type val) |
Sets the maximum number of rows in output row groups. More... | |
parquet_writer_options_builder & | max_page_size_bytes (size_t val) |
Sets the maximum uncompressed page size, in bytes. More... | |
parquet_writer_options_builder & | max_page_size_rows (size_type val) |
Sets the maximum page size, in rows. Counts only top-level rows, ignoring any nesting. Cannot be larger than the row group size in rows, and will be adjusted to match if it is. More... | |
parquet_writer_options_builder & | column_index_truncate_length (int32_t val) |
Sets the desired maximum size in bytes for min and max values in the column index. More... | |
parquet_writer_options_builder & | dictionary_policy (enum dictionary_policy val) |
Sets the policy for dictionary use. More... | |
parquet_writer_options_builder & | max_dictionary_size (size_t val) |
Sets the maximum dictionary size, in bytes. More... | |
parquet_writer_options_builder & | max_page_fragment_size (size_type val) |
Sets the maximum page fragment size, in rows. More... | |
parquet_writer_options_builder & | compression_statistics (std::shared_ptr< writer_compression_statistics > const &comp_stats) |
Sets the pointer to the output compression statistics. More... | |
parquet_writer_options_builder & | int96_timestamps (bool enabled) |
Sets whether int96 timestamps are written or not in parquet_writer_options. More... | |
parquet_writer_options_builder & | utc_timestamps (bool enabled) |
Set to true if timestamps are to be written as UTC. More... | |
parquet_writer_options_builder & | write_v2_headers (bool enabled) |
Set to true if V2 page headers are to be written. More... | |
operator parquet_writer_options && () | |
move parquet_writer_options member once it's built. | |
parquet_writer_options && | build () |
move parquet_writer_options member once it's built. More... | |
Class to build parquet_writer_options
.
Definition at line 894 of file parquet.hpp.
|
explicitdefault |
Default constructor.
This has been added since Cython requires a default constructor to create objects on stack.
|
inlineexplicit |
Constructor from sink and table.
sink | The sink used for writer output |
table | Table to be written to output |
Definition at line 911 of file parquet.hpp.
|
inline |
move parquet_writer_options member once it's built.
This has been added since Cython does not support overloading of conversion operators.
parquet_writer_options
object's r-value reference Definition at line 1152 of file parquet.hpp.
parquet_writer_options_builder& cudf::io::parquet_writer_options_builder::column_chunks_file_paths | ( | std::vector< std::string > | file_paths | ) |
Sets column chunks file path to be set in the raw output metadata.
file_paths | Vector of Strings which indicates file path. Must be same size as number of data sinks |
|
inline |
Sets the desired maximum size in bytes for min and max values in the column index.
Values exceeding this limit will be truncated, but modified such that they will still be valid lower and upper bounds. This only applies to variable length types, such as string. Maximum values will not be truncated if there is no suitable truncation that results in a valid upper bound.
Default value is 64.
val | length min/max will be truncated to, with 0 indicating no truncation |
Definition at line 1045 of file parquet.hpp.
|
inline |
Sets compression type in parquet_writer_options.
compression | The compression type to use |
Definition at line 964 of file parquet.hpp.
|
inline |
Sets the pointer to the output compression statistics.
comp_stats | Pointer to compression statistics to be filled once writer is done |
Definition at line 1101 of file parquet.hpp.
parquet_writer_options_builder& cudf::io::parquet_writer_options_builder::dictionary_policy | ( | enum dictionary_policy | val | ) |
Sets the policy for dictionary use.
Certain compression algorithms (e.g Zstandard) have limits on how large of a buffer can be compressed. In some circumstances, the dictionary can grow beyond this limit, which will prevent the column from being compressed. This setting controls how the writer should act in these circumstances. A setting of dictionary_policy::ADAPTIVE will disable dictionary encoding for columns where the dictionary exceeds the limit. A setting of dictionary_policy::NEVER will disable the use of dictionary encoding globally. A setting of dictionary_policy::ALWAYS will allow the use of dictionary encoding even if it will result in the disabling of compression for columns that would otherwise be compressed.
The default value is dictionary_policy::ALWAYS.
val | policy for dictionary use |
|
inline |
Sets whether int96 timestamps are written or not in parquet_writer_options.
enabled | Boolean value to enable/disable int96 timestamps |
Definition at line 1114 of file parquet.hpp.
parquet_writer_options_builder& cudf::io::parquet_writer_options_builder::key_value_metadata | ( | std::vector< std::map< std::string, std::string >> | metadata | ) |
Sets Key-Value footer metadata in parquet_writer_options.
metadata | Key-Value footer metadata |
parquet_writer_options_builder& cudf::io::parquet_writer_options_builder::max_dictionary_size | ( | size_t | val | ) |
Sets the maximum dictionary size, in bytes.
Disables dictionary encoding for any column chunk where the dictionary will exceed this limit. Only used when the dictionary_policy is set to 'ADAPTIVE'.
Default value is 1048576 (1MiB).
val | maximum dictionary size |
parquet_writer_options_builder& cudf::io::parquet_writer_options_builder::max_page_fragment_size | ( | size_type | val | ) |
Sets the maximum page fragment size, in rows.
Files with nested schemas or very long strings may need a page fragment size smaller than the default value of 5000 to ensure a single fragment will not exceed the desired maximum page size in bytes.
val | maximum page fragment size |
|
inline |
Sets the maximum uncompressed page size, in bytes.
Serves as a hint to the writer, and can be exceeded under certain circumstances. Cannot be larger than the row group size in bytes, and will be adjusted to match if it is.
val | maximum page size |
Definition at line 1013 of file parquet.hpp.
|
inline |
Sets the maximum page size, in rows. Counts only top-level rows, ignoring any nesting. Cannot be larger than the row group size in rows, and will be adjusted to match if it is.
val | maximum rows per page |
Definition at line 1026 of file parquet.hpp.
|
inline |
Sets metadata in parquet_writer_options.
metadata | Associated metadata |
Definition at line 931 of file parquet.hpp.
parquet_writer_options_builder& cudf::io::parquet_writer_options_builder::partitions | ( | std::vector< partition_info > | partitions | ) |
Sets partitions in parquet_writer_options.
partitions | Partitions of input table in {start_row, num_rows} pairs. If specified, must be same size as number of sinks in sink_info |
|
inline |
Sets the maximum row group size, in bytes.
val | maximum row group size |
Definition at line 985 of file parquet.hpp.
|
inline |
Sets the maximum number of rows in output row groups.
val | maximum number or rows |
Definition at line 997 of file parquet.hpp.
|
inline |
Sets the level of statistics in parquet_writer_options.
sf | Level of statistics requested in the output file |
Definition at line 952 of file parquet.hpp.
|
inline |
Set to true if timestamps are to be written as UTC.
enabled | Boolean value to enable/disable writing of timestamps as UTC. |
Definition at line 1126 of file parquet.hpp.
parquet_writer_options_builder& cudf::io::parquet_writer_options_builder::write_v2_headers | ( | bool | enabled | ) |
Set to true if V2 page headers are to be written.
enabled | Boolean value to enable/disable writing of V2 page headers. |