| % Generated by roxygen2: do not edit by hand |
| % Please edit documentation in R/schema.R |
| \docType{class} |
| \name{Schema} |
| \alias{Schema} |
| \alias{schema} |
| \title{Schema class} |
| \usage{ |
| schema(...) |
| } |
| \arguments{ |
| \item{...}{named list of \link[=data-type]{data types}} |
| } |
| \description{ |
| A \code{Schema} is a list of \link{Field}s, which map names to |
| Arrow \link[=data-type]{data types}. Create a \code{Schema} when you |
| want to convert an R \code{data.frame} to Arrow but don't want to rely on the |
| default mapping of R types to Arrow types, such as when you want to choose a |
| specific numeric precision, or when creating a \link{Dataset} and you want to |
| ensure a specific schema rather than inferring it from the various files. |
| |
| Many Arrow objects, including \link{Table} and \link{Dataset}, have a \verb{$schema} method |
| (active binding) that lets you access their schema. |
| } |
| \section{Methods}{ |
| |
| \itemize{ |
| \item \verb{$ToString()}: convert to a string |
| \item \verb{$field(i)}: returns the field at index \code{i} (0-based) |
| \item \verb{$GetFieldByName(x)}: returns the field with name \code{x} |
| \item \verb{$WithMetadata(metadata)}: returns a new \code{Schema} with the key-value |
| \code{metadata} set. Note that all list elements in \code{metadata} will be coerced |
| to \code{character}. |
| } |
| } |
| |
| \section{Active bindings}{ |
| |
| \itemize{ |
| \item \verb{$names}: returns the field names (called in \code{names(Schema)}) |
| \item \verb{$num_fields}: returns the number of fields (called in \code{length(Schema)}) |
| \item \verb{$fields}: returns the list of \code{Field}s in the \code{Schema}, suitable for |
| iterating over |
| \item \verb{$HasMetadata}: logical: does this \code{Schema} have extra metadata? |
| \item \verb{$metadata}: returns the key-value metadata as a named list. |
| Modify or replace by assigning in (\code{sch$metadata <- new_metadata}). |
| All list elements are coerced to string. |
| } |
| } |
| |
| \section{R Metadata}{ |
| |
| |
| When converting a data.frame to an Arrow Table or RecordBatch, attributes |
| from the \code{data.frame} are saved alongside tables so that the object can be |
| reconstructed faithfully in R (e.g. with \code{as.data.frame()}). This metadata |
| can be both at the top-level of the \code{data.frame} (e.g. \code{attributes(df)}) or |
| at the column (e.g. \code{attributes(df$col_a)}) or for list columns only: |
| element level (e.g. \code{attributes(df[1, "col_a"])}). For example, this allows |
| for storing \code{haven} columns in a table and being able to faithfully |
| re-create them when pulled back into R. This metadata is separate from the |
| schema (column names and types) which is compatible with other Arrow |
| clients. The R metadata is only read by R and is ignored by other clients |
| (e.g. Pandas has its own custom metadata). This metadata is stored in |
| \verb{$metadata$r}. |
| |
| Since Schema metadata keys and values must be strings, this metadata is |
| saved by serializing R's attribute list structure to a string. If the |
| serialized metadata exceeds 100Kb in size, by default it is compressed |
| starting in version 3.0.0. To disable this compression (e.g. for tables |
| that are compatible with Arrow versions before 3.0.0 and include large |
| amounts of metadata), set the option \code{arrow.compress_metadata} to \code{FALSE}. |
| Files with compressed metadata are readable by older versions of arrow, but |
| the metadata is dropped. |
| } |
| |
| \examples{ |
| \donttest{ |
| df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5)) |
| tab1 <- Table$create(df) |
| tab1$schema |
| tab2 <- Table$create(df, schema = schema(col1 = int8(), col2 = float32())) |
| tab2$schema |
| } |
| } |