arrow-tiberius is a Rust library for bridging Apache Arrow and Microsoft SQL
Server through the Tiberius TDS driver.
The crate is designed around a bidirectional boundary:
Arrow Schema + RecordBatch values
-> SQL Server write plan and DDL
-> SQL Server bulk load through Tiberius
SQL Server metadata and rows through Tiberius
-> Arrow schema and RecordBatch values
The v0.1 release implements the Arrow-to-SQL Server write path first. The public API is still intentionally shaped around Arrow, SQL Server profiles, structured diagnostics, and directional modules so a SQL Server-to-Arrow read path can be added without renaming the crate or replacing the core model.
Note
v0.1 implements the Arrow-to-SQL Server direction only. SQL Server-to-Arrow reading is reserved for a later release.
In v0.1, arrow-tiberius provides:
- Arrow-to-SQL Server schema planning.
- SQL Server identifiers, type metadata, compatibility profile, and DDL helpers.
- Structured planning and runtime diagnostics.
- Arrow
RecordBatchbulk writing through Tiberius. - Baseline and optimized writer backend selection.
- SQL Server integration tests and writer benchmark harnesses.
It does not provide SQL Server-to-Arrow reads yet.
Add the crate:
[dependencies]
arrow-tiberius = "0.1"Plan an Arrow schema and render deterministic CREATE TABLE SQL:
use arrow_schema::{DataType, Field, Schema};
use arrow_tiberius::{
MssqlProfile, PlanOptions, TableName, create_table_sql_from_mappings,
plan_arrow_schema_to_mssql_mappings,
};
fn main() -> arrow_tiberius::Result<()> {
let schema = Schema::new(vec![
Field::new("id", DataType::Int64, false),
Field::new("name", DataType::Utf8, true),
]);
let outcome = plan_arrow_schema_to_mssql_mappings(
&schema,
MssqlProfile::sql_server_2016_compat_100(),
PlanOptions::default(),
)?;
let table = TableName::new("dbo", "people")?;
let ddl = create_table_sql_from_mappings(&table, outcome.value());
assert!(ddl.contains("CREATE TABLE [dbo].[people]"));
Ok(())
}Write batches to an existing SQL Server table with a crate-owned compatible SQL Server client:
use arrow_array::RecordBatch;
use arrow_tiberius::{
MssqlProfile, PlanOptions, TableName, WriteBackend, WriteOptions,
connect_mssql_client_from_ado_string, plan_arrow_schema_to_mssql_mappings,
};
async fn write_batch(
connection_string: &str,
batch: &RecordBatch,
) -> arrow_tiberius::Result<()> {
let mut client = connect_mssql_client_from_ado_string(connection_string).await?;
let outcome = plan_arrow_schema_to_mssql_mappings(
batch.schema().as_ref(),
MssqlProfile::sql_server_2016_compat_100(),
PlanOptions::default(),
)?;
let table = TableName::new("dbo", "people")?;
let mut writer = client
.bulk_writer(
table,
outcome.value().to_vec(),
WriteOptions {
backend: WriteBackend::DirectRawBulk,
..WriteOptions::default()
},
)
.await?;
writer.write_batch(batch).await?;
writer.finish().await?;
Ok(())
}The connected writer validates the target table metadata before sending rows. It does not create the target table automatically; callers can use the DDL helpers when they want this crate to produce the table definition.
Planning and write failures return structured diagnostics instead of relying on string parsing. Callers can inspect severity, machine-readable code, field, row, and message.
use arrow_schema::{DataType, Field, Schema};
use arrow_tiberius::{
Error, MssqlProfile, PlanOptions, plan_arrow_schema_to_mssql_mappings,
};
let schema = Schema::new(vec![Field::new("raw", DataType::UInt64, false)]);
let err = plan_arrow_schema_to_mssql_mappings(
&schema,
MssqlProfile::sql_server_2016_compat_100(),
PlanOptions::default(),
)
.expect_err("UInt64 requires an explicit policy by default");
if let Error::Planning { diagnostics } = err {
for diagnostic in diagnostics.all() {
println!("{:?}: {}", diagnostic.code(), diagnostic.message());
}
}See Arrow to SQL Server Type Mapping for the full supported and unsupported mapping surface.
WriteBackend controls how planned Arrow rows are sent to SQL Server:
| Backend | Purpose |
|---|---|
Auto |
Default selection. Currently resolves to DirectRawBulk. |
BaselineTokenRow |
Compatibility and reference path using Tiberius TokenRow bulk load. |
DirectFramedBulk |
Direct Arrow-to-TDS row encoding through Tiberius framed writes. |
DirectRawBulk |
Optimized direct encoder plus raw bulk packet writes from the Tiberius fork. |
The direct raw backend is the optimized production path for currently supported mappings. The baseline backend remains useful for compatibility checks, debugging, and parity tests.
Compile-checked examples are available under examples/ and do not require SQL
Server:
cargo run --example schema_to_ddl
cargo run --example planning_diagnostics
cargo run --example backend_selection
cargo run --example policy_dependent_planningThe examples cover schema to DDL, planning diagnostics, backend selection, and policy-dependent planning.
An environment-gated SQL Server write example is also available:
ARROW_TIBERIUS_EXAMPLE_MSSQL_URL='server=tcp:localhost,1433;user=sa;password=...;TrustServerCertificate=true' \
cargo run --example sqlserver_batch_writeBy default it creates, writes to, and drops [dbo].[arrow_tiberius_example_write].
Set ARROW_TIBERIUS_EXAMPLE_KEEP_TABLE=1 to keep the disposable table, or set
ARROW_TIBERIUS_EXAMPLE_MSSQL_SCHEMA, ARROW_TIBERIUS_EXAMPLE_MSSQL_TABLE,
and ARROW_TIBERIUS_EXAMPLE_EXISTING_TABLE=1 to write to an existing table
explicitly.
The v0.1 profile targets SQL Server 2016 with database compatibility level 100:
use arrow_tiberius::MssqlProfile;
let profile = MssqlProfile::sql_server_2016_compat_100();See Integration Tests for the SQL Server validation path used by this repository.
arrow-tiberius depends on the published tiberius-raw-bulk package as the
crate name tiberius and owns that compatibility boundary internally:
tiberius = { package = "tiberius-raw-bulk", version = "=0.12.3-raw-bulk.13", default-features = false, features = [
"tds73",
"winauth",
"native-tls",
] }Downstream crates should normally depend only on arrow-tiberius and construct
SQL Server clients through the connected-client API:
[dependencies]
arrow-tiberius = "0.1"Use connect_mssql_client_from_ado_string, ConnectedMssqlClient, and
ConnectedBulkWriter when lifecycle SQL and bulk loading must run on the same
connection. Depending on upstream tiberius separately creates a distinct crate
type. A client from upstream tiberius is not the same type as a client from
tiberius-raw-bulk and will not match this crate's writer internals.
The fork exists because upstream Tiberius does not expose the raw bulk packet APIs needed by the optimized direct writer. The baseline writer and direct writer use the same forked package dependency; only the optimized backend calls the raw-row APIs.
| Feature | Default | Purpose |
|---|---|---|
bench-profile |
no | Enables benchmark-only direct write profiling hooks and forwards to tiberius/bulk-load-profile. |
integration-tests |
no | Enables SQL Server integration tests that require explicit environment setup or the xtask runner. |
Docs.rs is configured to build with all features so feature-gated public items are documented. Normal library use does not require either feature.
Default local validation does not require SQL Server:
cargo fmt --check
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test --workspaceRun SQL Server integration tests through the xtask harness:
cargo xtask sqlserver-testThe harness starts SQL Server when possible, configures compatibility level 100, runs feature-gated integration tests, and cleans up managed resources. See Integration Tests for container runtime and existing-server options.
Writer benchmark commands and interpretation guidance are in Writer Benchmarks. The curated direct raw benchmark summary is in Direct Raw Benchmark Comparison.
arrow-odbc is the broader Arrow/ODBC crate. It
targets ODBC data sources generally and supports reading and writing Arrow
arrays through ODBC drivers. Use it when you need a database-agnostic ODBC path
or SQL-to-Arrow reads today.
arrow-tiberius is narrower: it targets Microsoft SQL Server through Tiberius
and focuses v0.1 on Arrow-to-SQL Server bulk writes. That narrower scope lets
the direct raw backend use SQL Server-specific TDS bulk-load encoding instead of
going through ODBC.
For the SQL Server write workloads this crate is built around, the local
benchmark data generally favors DirectRawBulk: it is much faster than
arrow-odbc on primitive and mixed nullable rows while using far less memory.
Representative runs show about 3.05x throughput on primitive numeric rows
with about 20 MiB peak RSS versus about 998 MiB, and about 1.66x throughput on
mixed nullable rows with about 21 MiB peak RSS versus about 157 MiB. The main
exception is some large variable-width text/binary workloads, where arrow-odbc
can write about 1.28x to 1.37x faster but with roughly 1.4 GiB peak RSS versus
about 100 MiB for DirectRawBulk. See
primitive direct raw comparison
and
variable-width direct raw comparison
for the measured numbers and setup.
arrow-tiberius is preparing its first v0.1 release. The v0.1 release focus is
Arrow-to-SQL Server writing. SQL Server-to-Arrow reading is reserved for a later
release.
See v0.1 Release Boundary for the maintainer release scope, gates, and publication checklist.