Craton Bolt
FAQ
FAQ
Frequently asked questions about Craton Bolt. For the supported SQL surface
see SQL_REFERENCE.md; for planned work see
../ROADMAP.md.
Q1. Why no NVRTC or runtime LLVM?
NVRTC compiles CUDA C++ to PTX; Craton Bolt emits PTX directly from Rust, so
NVRTC would buy us nothing. To go PTX → SASS we call the driver's
cuModuleLoadDataEx, which assembles internally. The result is that the
only runtime dependency is the CUDA driver itself — no libnvrtc, no
bundled ptxas, no LLVM. NVRTC alone would add roughly 80 MB of shared
libraries to a deployment.
Q2. Why no macOS support?
Apple removed CUDA from macOS in 2019; there is no NVIDIA driver and no
recent CUDA toolkit for any Apple-shipped GPU. You can still type-check
the crate on macOS with cargo check --features cuda-stub, but you
cannot run a kernel.
Q3. What's the status of async memcpy?
It's shipped. The FFI bindings for cuMemcpyAsync and pinned host
allocation landed in 0.3.0, and the safe wrappers (memcpy_h2d_async,
memcpy_d2h_async, memset_d8_async, PinnedHostBuffer<T>,
GpuBuffer::copy_{from,to}_async) followed. The scalar-aggregate
executor was the 0.6 pilot
(upload_primitive_values_async); 0.7 rolled async memcpy out to the
remaining GROUP BY variants (tier2 / shmem / wide / valid) and added
async D2H for compact::download_mask (the WHERE filter path). See
docs/JIT_PIPELINE.md for the staging history.
Q4. Can I share a GpuVec between threads?
GpuVec<T> is Send but not Sync. You can move ownership of a vec to
another thread, but only one thread at a time can take a GpuViewMut.
GpuView<'a, T> is Send + !Sync for the same reason — sharing an
immutable view across threads would let a sibling thread holding the
parent vec construct a GpuViewMut and race a writer kernel against
your reader.
Q5. What happens if I call Engine::sql on a batch with too many rows?
The n_rows_to_u32 helper errors when n_rows > u32::MAX. The PTX
kernels use .u32 for the row count and .s32 for the thread index, so
the engine refuses to launch with a row count that would overflow.
Q6. How do I run on a specific GPU?
Engine::new_with_device(idx). Engine::new() picks device 0. There is
one CUDA context per engine, so a multi-GPU workload runs one engine per
device.
Q7. Why does SUM(int_col) return Int64 even when the column is Int32?
Widening. SUM(Int32) -> Int64 gives the accumulator headroom on long
columns; SUM(Int64) and SUM(Float32|Float64) are unchanged. The
widening is applied consistently in the scalar and GROUP BY paths via
crate::plan::logical_plan::sum_output_dtype. Note that widening is not a
substitute for overflow safety: if the i64 accumulator does overflow, the
query fails loudly with a BoltError::Type("SUM(integer) overflow") rather
than wrapping silently (the same applies to SUM(Decimal128)). See
SQL_REFERENCE.md and LIMITATIONS.md
for the full overflow semantics, including the grouped-SUM streaming caveat.
Q8. Are SELECT t.col FROM t and SELECT COL FROM t (uppercase) accepted?
Yes — both work as of 0.5. Single-level qualified column references
(t.col, alias.col) resolve against the FROM-tree, including JOIN
aliases, in SELECT / WHERE / GROUP BY / HAVING / JOIN ... ON; only the
resolved column name survives lowering. Deeper qualifications
(db.t.col, struct-field access) are still rejected.
Identifiers are also case-insensitive: an unquoted SQL ident folds to
lowercase at parse time, and schema lookup falls back to a
case-insensitive match, so SELECT COL resolves to a column named
col. Quoted identifiers ("MyCol") preserve case and match verbatim.
Q9. Why is the CHANGELOG / NOTICE attribution "Craton Software Company"?
The project is licensed Apache-2.0 with org-level copyright attribution
to Craton Software Company. Individual contributors are covered by DCO
sign-off on commits (see CONTRIBUTING.md). The copyright line is a
formality of the Apache-2.0 NOTICE convention, not a CLA.
Q10. How do I report a security issue?
Email security@craton.com.ar. Do not file public GitHub issues
for vulnerabilities. See ../SECURITY.md for the full
disclosure policy.
Q11. Why does the codegen always target sm_70?
sm_70 (Volta, V100) is the realistic floor for serious GPU compute and
covers every instruction the JIT emits — atom.global.add.f64,
atom.global.cas.b64, shfl.sync.*, cvta.to.global.u64. Targeting
lower would lose shfl.sync and the f64 atomic add; targeting higher
would shrink the deployment surface for no codegen benefit. If you
change the floor, audit src/jit/float_atomics.rs first — the CAS-loop
pattern there exists because atom.global.{min,max}.f* is still
unavailable through sm_90.
Q12. Why does cargo bench show engine_execute as skipped?
The bench file gates GPU benches on the BOLT_BENCH_GPU=1
environment variable so that contributors without a GPU can still run
cargo bench for the planner / codegen / CPU-reference / Polars
comparisons. Set the variable to include the GPU path. See
DEVELOPMENT.md for the bench commands.