Skip to main content

4 posts tagged with "Tutorial"

View All Tags

How to Export Data from SQL Server to Parquet Files with FastBCP

Pierre-Antoine Collet
Pierre-Antoine Collet, ARPE.IO Developer
2026-02-17 · 5 min

Exporting data from SQL Server into a file format that's analytics-friendly is a common need: data lake ingestion, long-term storage, or feeding modern query engines.

In this tutorial, we'll export the TPC-H SF=10 orders table (~15M rows, 9 columns) from SQL Server into Parquet files using FastBCP.

Context

CSV is universal, but it's not always the best format for analytics:

  • larger files (no compression by default)
  • weak typing (everything is "text")
  • slower scans for columnar workloads

Parquet is a better fit for analytics pipelines:

  • columnar layout (faster scans)
  • typed columns
  • efficient compression
  • great interoperability (DuckDB, Spark, Trino, Polars…)

FastBCP makes it easy to export a SQL Server table into Parquet while leveraging parallelism on multi-core machines.

The dataset we will export (TPC-H SF=10 orders)

We'll export the TPC-H SF=10 dataset, table orders.

  • Table: orders
  • Scale: SF=10
  • Size: ~15 million rows
  • Columns: 9 columns
  • Source: tpch10.dbo.orders (SQL Server)
  • Output directory: D:\temp\tpch10\orders\parquet
Click here to see the test environment specifications
  • Model: MSI KATANA 15 HX B14W
  • OS: Windows 11
  • CPU: Intel(R) Core(TM) i7-14650HX @ 2200 MHz — 16 cores / 24 logical processors
  • SQL Server 2022

Source table in SQL Server

SELECT COUNT(*)
FROM dbo.orders;

SELECT TOP 10 *
FROM dbo.orders;

Orders table in SQL Server

What is FastBCP?

FastBCP is a command-line tool focused on fast exports from SQL Server to files, with a strong emphasis on parallelism.

For SQL Server → Parquet exports, FastBCP is interesting because it can:

  • export large tables efficiently
  • split the export across multiple workers (parallel tiles)
  • write a modern analytics format (Parquet)
  • optionally keep the outputs separate files (or merge if needed)

Exporting orders to Parquet files with FastBCP

To build the command line, we used the FastBCP Wizard: https://fastbcp.arpe.io/docs/latest/wizard

Here is the exact command we used to export the 15 million rows into Parquet:

.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost,11433" `
--trusted `
--database "tpch10" `
--sourceschema "dbo" `
--sourcetable "orders" `
--fileoutput "orders.parquet" `
--directory "D:\temp\tpch10\orders\parquet" `
--parallelmethod "Ntile" `
--paralleldegree -2 `
--distributekeycolumn "o_orderkey" `
--merge false

A few things to highlight:

  • --parallelmethod "Ntile" splits the table into multiple tiles for parallel export.
  • --paralleldegree -2 uses number of cores / 2 workers. On this machine we have 24 cores, so -2 means 12 parallel workers.
  • With Ntile + 12 workers, the export is naturally split into 12 partitions, and because --merge false, FastBCP writes 12 output Parquet files (one per partition).
  • --distributekeycolumn "o_orderkey" is a great key for partitioning TPC-H orders.
  • --merge false keeps the output as multiple Parquet files, which is often ideal for downstream parallel processing.

1) Running the FastBCP command from PowerShell

FastBCP command

2) FastBCP completion output (elapsed time and summary)

FastBCP result

3) Verifying the Parquet files generated

Parquet files

Results

On this machine, the export completed in 12.6 seconds, generating ~15 million rows (9 columns) from SQL Server split across 12 Parquet files.

For more details on all available options (parallel methods, degrees, mapping, sources, output format), see the documentation: https://fastbcp.arpe.io/docs/latest/

Conclusion

If your goal is analytics-friendly exports (data lake, BI, query engines), Parquet is usually a better target than CSV.

With FastBCP, exporting a large SQL Server table into multiple Parquet files becomes a one-command workflow—while taking advantage of parallelism on multi-core machines.

Resources

How to Export Data from SQL Server to CSV Files with FastBCP

Pierre-Antoine Collet
Pierre-Antoine Collet, ARPE.IO Developer
2026-02-13 · 7 min

Exporting data from SQL Server to CSV is one of those tasks that feels straightforward… until the table gets large and you want the export to finish fast while fully using your machine.

In this tutorial, we'll start with the traditional approach (BCP) and then export the same dataset using FastBCP, a more powerful option when you want to leverage parallelism.

Context

The classic tool for exporting data to files in the SQL Server ecosystem is BCP. It's stable, widely used, and easy to script.

But when your exports become heavy (large tables, tight windows, multi-core machines), a single-threaded pipeline can leave performance on the table.

That's where FastBCP comes in: it keeps the simplicity of a CLI workflow, while adding parallel execution and a pipeline designed for massive throughput.

The dataset we will export (TPC-H SF=10 orders)

We'll export the TPC-H SF=10 dataset, table orders.

  • Table: orders
  • Scale: SF=10
  • Size: ~15 million rows
  • Columns: 9 columns
  • Source: tpch10.dbo.orders (SQL Server)
  • Output directory: D:\temp\tpch10\orders\csv
Click here to see the test environment specifications
  • Model: MSI KATANA 15 HX B14W
  • OS: Windows 11
  • CPU: Intel(R) Core(TM) i7-14650HX @ 2200 MHz — 16 cores / 24 logical processors
  • SQL Server 2022

Source table in SQL Server

SELECT COUNT(*)
FROM dbo.orders;

SELECT TOP 10 *
FROM dbo.orders;

Orders table in SQL Server

What is BCP?

BCP (Bulk Copy Program) is Microsoft's command-line utility for bulk data import/export with SQL Server.

Why people like it:

  • it's native to the SQL Server ecosystem
  • it's simple and script-friendly
  • it's been used in production for years

Docs (Microsoft Learn): https://learn.microsoft.com/fr-fr/sql/relational-databases/import-export/import-and-export-bulk-data-by-using-the-bcp-utility-sql-server?view=sql-server-ver17

Exporting orders with BCP

BCP can export a table (or query) to a file. Here is a practical baseline command:

bcp "SELECT o_orderdate, o_orderkey, o_custkey, o_orderpriority, o_shippriority, o_clerk, o_orderstatus, o_totalprice, o_comment FROM tpch10.dbo.orders" queryout "D:\temp\tpch10\orders\single_csv\orders.csv" `
-S "localhost,11433" `
-T `
-c `
-t "|" `
-r "\n"

Notes:

  • Using queryout gives you full control over the column order.
  • This exports into a single file (orders.csv).
  • If you want a header row, you typically have to add it yourself (BCP doesn't automatically emit headers for queryout).

1) Running the BCP command from PowerShell

BCP command

2) BCP completion output (elapsed time and summary) BCP result

3) Verifying the CSV file generated BCP export

On this machine, the export completed in 62 seconds to export ~15 million rows with 9 columns from SQL Server into CSV file.

What is FastBCP?

FastBCP is a command-line tool focused on fast exports from SQL Server to files, with a strong emphasis on parallelism.

For SQL Server → CSV exports, FastBCP is interesting because it can:

  • export large tables efficiently
  • split the export across multiple workers (parallel tiles)
  • optionally keep the outputs separate files (or merge if needed)

Exporting orders to CSV files with FastBCP

To build the command line, we used the FastBCP Wizard: https://fastbcp.arpe.io/docs/latest/wizard

Here is the exact command we used to export the 15 millions lines (in parallel) from SQL Server into 12 CSV files:

.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost,11433" `
--trusted `
--database "tpch10" `
--sourceschema "dbo" `
--sourcetable "orders" `
--fileoutput "orders.csv" `
--directory "D:\temp\tpch10\orders\csv" `
--parallelmethod "Ntile" `
--paralleldegree -2 `
--distributekeycolumn "o_orderkey" `
--merge false

A few things to highlight:

  • --parallelmethod "Ntile" splits the table into multiple tiles for parallel export.
  • --paralleldegree -2 uses number of cores / 2 workers. On this machine we have 24 cores, so -2 means 12 parallel workers.
  • With Ntile + 12 workers, the export is naturally split into 12 partitions, and because --merge false, FastBCP writes 12 output CSV files (one per partition).
  • --distributekeycolumn "o_orderkey" is a great key for partitioning TPC-H orders.
  • --merge false keeps the output as multiple files (one per partition), which is often ideal for downstream parallel ingestion.

1) Running the FastBCP command from PowerShell

FastBCP command

2) BCP completion output (elapsed time and summary) FastBCP result

3) Verifying the CSV files generated FastBCP export

On this machine, the export completed in 6.6 seconds, generating ~15 million rows (9 columns) from SQL Server split across 12 CSV files.

For more details on all available options (parallel methods, degrees, mapping, sources, output format), see the documentation: https://fastbcp.arpe.io/docs/latest/

Conclusion

If you only need a quick export and you're fine with a traditional workflow, BCP is still a solid option.

But if you want fast exports on large tables and you want to take advantage of your CPU using parallelism, FastBCP is a compelling approach especially when exporting into multiple CSV files for downstream parallel processing.

Resources

How to Transfer Data from SQL Server into PostgreSQL with FastTransfer

Pierre-Antoine Collet
Pierre-Antoine Collet, ARPE.IO Developer
2026-02-11 · 6 min

Moving data from SQL Server to PostgreSQL is a pretty common task: migrations, analytics pipelines, or just keeping environments in sync.

In this tutorial, we'll transfer the TPC-H SF=10 orders table (~15M rows, 9 columns) from SQL Server into PostgreSQL using FastTransfer (fasttransfer.arpe.io).

Context

A lot of SQL Server → Postgres workflows start with an intermediate CSV export:

  • export to CSV
  • move files around
  • import with COPY

It works, but it adds friction: temporary files, escaping/format issues, and more moving parts.

FastTransfer avoids all of that by streaming the data directly from SQL Server to PostgreSQL, while still using the fastest native mechanisms on each side.

The dataset we will transfer (TPC-H SF=10 orders)

We'll use the TPC-H SF=10 dataset, table orders.

  • Table: orders
  • Scale: SF=10
  • Size: ~15 million rows
  • Columns: 9 columns
  • Source: tpch10.dbo.orders (SQL Server)
  • Target: postgres.public.orders (PostgreSQL)
Click here to see the test environment specifications
  • Model: MSI KATANA 15 HX B14W
  • OS: Windows 11
  • CPU: Intel(R) Core(TM) i7-14650HX @ 2200 MHz — 16 cores / 24 logical processors
  • SQL Server 2022
  • PostgreSQL 16

Source table in SQL Server

SELECT COUNT(*) AS rows_source
FROM dbo.orders;

SELECT TOP 10 * FROM dbo.orders

Orders table in SQL Server

Target table in PostgreSQL

Make sure public.orders exists in PostgreSQL with a compatible schema (same column names/types).

CREATE TABLE public.orders (
o_orderdate date NULL,
o_orderkey int8 NOT NULL,
o_custkey int8 NOT NULL,
o_orderpriority bpchar(15) NULL,
o_shippriority int4 NULL,
o_clerk bpchar(15) NULL,
o_orderstatus bpchar(1) NULL,
o_totalprice money NULL,
o_comment varchar(79) NULL
)
WITH (
parallel_workers=24
);

Create table in Postgres

What is FastTransfer?

FastTransfer (fasttransfer.arpe.io) is a command-line tool built for high-performance data movement (file → DB, DB → DB), designed to maximize throughput with streaming and parallel execution.

For SQL Server → PostgreSQL, FastTransfer is interesting because it can:

  • read from SQL Server efficiently (mssql)
  • load into PostgreSQL using the COPY protocol (pgcopy)
  • split work across threads to better use the machine

Transferring orders from SQL Server to PostgreSQL

To build the command line, we used the FastTransfer Wizard: https://fasttransfer.arpe.io/docs/latest/wizard

Here is the exact command we used:

.\FastTransfer.exe `
--sourceconnectiontype mssql `
--sourceserver "localhost,11433" `
--sourcedatabase "tpch10" `
--sourcetrusted `
--sourceschema "dbo" `
--sourcetable "orders" `
--targetconnectiontype pgcopy `
--targetserver "localhost:25433" `
--targetdatabase "postgres" `
--targetuser "postgres" `
--targetpassword "postgres" `
--targetschema "public" `
--targettable "orders" `
--loadmode Truncate `
--mapmethod Name `
--method Ntile `
--degree -4 `
--distributekeycolumn "o_orderkey"

A few things to highlight:

  • --targetconnectiontype pgcopy uses PostgreSQL's COPY protocol (fast path for bulk loads).
  • --loadmode Truncate clears the target table before loading (clean, repeatable runs).
  • --mapmethod Name maps columns by name (safer than positional mapping).
  • --method Ntile enables parallel execution by splitting the dataset into tiles.
  • --degree -4 uses "number of cores/4" threads (leaves some CPU for the system).
  • o_orderkey is a great distribution key for TPC-H orders (high cardinality, stable).

1) Running the FastTransfer command from PowerShell

FastTransfer import command

2) FastTransfer completion output (elapsed time and summary)

FastTransfer import result

3) Verifying the number of imported rows in Postgres

Once the transfer completes, validate the row count:

SELECT COUNT(*)
FROM public.orders;

SELECT *
FROM public.orders
LIMIT 10;

FastTransfer row count validation

On this machine, the transfer of the orders table from SQL Server completed in 16 seconds, successfully loading ~15 million rows with 9 columns into PostgreSQL.

For more details on all available options (parallel methods, degrees, mapping, sources/targets), see the documentation: https://fasttransfer.arpe.io/docs/latest/

Conclusion

If you've been moving SQL Server data to PostgreSQL via CSV exports, FastTransfer is a nice upgrade:

  • direct DB → DB streaming (no intermediate files)
  • uses native fast paths (mssql + Postgres COPY)
  • optional parallelism for higher throughput

For more details on the available options (parallel methods, degree, mapping, source/target connectors), see the documentation: https://fasttransfer.arpe.io/docs/latest/

Resources

How to Import CSV Files into SQL Server with FastTransfer

Pierre-Antoine Collet
Pierre-Antoine Collet, ARPE.IO Developer
2026-02-09 · 10 min

CSV imports are one of those tasks that look trivial… until the files get large, the load window gets tight, and your server is clearly capable of doing more.

In this tutorial, we'll start with the traditional approach BCP and then show how to import the same data using FastTransfer, a more powerful option when you want to leverage parallelism and a modern, flexible import pipeline.

Context

SQL Server has excellent bulk-loading capabilities, and Microsoft provides tooling around them. For many teams, BCP is the default choice: stable, widely known, easy to script.

But when you deal with very large datasets, two things tend to happen:

  • you want better CPU utilization (multi-core machines shouldn't look idle during a load)
  • you want more flexibility in how you parse and stream data from files (CSV quirks are real)

That's where FastTransfer comes in: it keeps the simplicity of a CLI workflow, while adding parallel execution and a pipeline designed for massive throughput.

The dataset we will load (TPC-H SF=10 orders)

We'll use the TPC-H SF=10 dataset, table orders.

  • Table: orders
  • Scale: SF=10
  • Size: ~15 million rows
  • Columns: 9 columns
  • Input files: 12 CSV files, ~145 MB each
Click here to see the test environment specifications
  • Model: MSI KATANA 15 HX B14W
  • OS: Windows 11
  • CPU: Intel(R) Core(TM) i7-14650HX @ 2200 MHz — 16 cores / 24 logical processors
  • SQL Server 2022

File layout

Our orders data is split into 12 files (for easier handling and to naturally enable parallel processing later on):

TPC-H orders CSV files

CSV format

The files are pipe-delimited (|) and contain a header row (column names on the first line).

Here is a small extract showing the column order, the delimiter, and sample rows:

o_orderdate|o_orderkey|o_custkey|o_orderpriority|o_shippriority|o_clerk|o_orderstatus|o_totalprice|o_comment
1993-03-18|1127520|1330177|4-NOT SPECIFIED|0|Clerk#000004945|F|79371.45|even requests boost according to the boldly ironic foxes.
1993-12-12|1127557|742396|4-NOT SPECIFIED|0|Clerk#000003450|F|236270.35|blithely final deposits
1993-06-08|1127651|1334107|4-NOT SPECIFIED|0|Clerk#000001896|F|140919.34|fluffily ironic ideas wake fluffily above t
1994-02-24|1127654|716249|4-NOT SPECIFIED|0|Clerk#000002901|F|118316.89|stealthily regular theodolites cajole blit
1992-04-21|1127778|534959|4-NOT SPECIFIED|0|Clerk#000007212|F|73935.49|accounts detect furiou

Notes:

  • Delimiter: |
  • o_totalprice uses a dot as decimal separator in this extract (e.g., 79371.45) — keep this in mind when configuring parsing/conversion.
  • The o_comment field is free text and may contain spaces.

Target table in SQL Server

Make sure dbo.orders exists in your target database (example: tpch10) with a schema matching your CSV.

TPC-H orders create table

What is BCP?

BCP (Bulk Copy Program) is Microsoft's command-line utility for bulk data import/export with SQL Server.

Why people like it:

  • it's native to the SQL Server ecosystem
  • it's simple and script-friendly
  • it's been used in production for years

Docs (Microsoft Learn): https://learn.microsoft.com/fr-fr/sql/relational-databases/import-export/import-and-export-bulk-data-by-using-the-bcp-utility-sql-server?view=sql-server-ver17

A practical BCP import command

BCP is designed to load one data file at a time: the in argument points to a single file, and wildcards like orders*.csv are not supported as an input file.

Because our orders dataset is split into 12 CSV files, we first merged them into a single file for the BCP workflow (keeping only one header line).

Example for a pipe-delimited file:

bcp tpch10.dbo.orders in "D:\temp\tpch10\orders\single_csv\orders.csv" `
-S "localhost,11433" `
-T `
-c `
-t "|" `
-r "\n" `
-F 2 `
-b 100000

Notes:

  • -F 2 skips the header row (line 1), since the merged file contains column names on the first line.
  • Update the path and server/port to match your environment.

Here are the screenshots from the run:

1) Running the BCP command from PowerShell

BCP import command

2) BCP completion output (elapsed time and summary)

BCP import result

3) Verifying the number of imported rows in SQL Server

Row count validation

On this machine, the import completed in 21 seconds for the merged orders.csv, successfully loading ~15 million rows with 9 columns into SQL Server.

At this stage, always validate the row count (and optionally a few spot checks on values) before moving on to the next steps.

What is FastTransfer?

FastTransfer (fasttransfer.arpe.io) is a command-line tool built for high-performance data movement (file → DB, DB → DB), designed to maximize throughput with streaming and parallel execution.

For CSV → SQL Server, FastTransfer is interesting because it can:

  • parse CSV efficiently (including "real-world CSV" options)
  • stream rows to SQL Server using bulk loading (msbulk)
  • split work across threads to better use the machine

Importing the orders CSV files into SQL Server

To build the command line, we used the FastTransfer Wizard: https://fasttransfer.arpe.io/docs/latest/wizard

Here is the exact command we used to import the 12 CSV files (in parallel) into SQL Server:

.\FastTransfer.exe `
--sourceconnectiontype duckdbstream `
--sourceserver ":memory:" `
--query "SELECT * exclude filename FROM read_csv('D:\temp\tpch10\orders\csv\*.csv', filename=true)" `
--targetconnectiontype msbulk `
--targetserver "localhost,11433" `
--targetdatabase "tpch10" `
--targettrusted `
--targetschema "dbo" `
--targettable "orders" `
--loadmode Truncate `
--mapmethod Name `
--method DataDriven `
--degree 12 `
--distributekeycolumn "filename" `
--datadrivenquery "select file from glob('D:\temp\tpch10\orders\csv\*.csv')"

A few things to highlight:

  • We read all files via a glob (D:\temp\tpch10\orders\csv\*.csv).
  • We enable filename=true and use it as a distribution key so the work can be split across files.
  • The DataDriven method combined with --degree 12 lets FastTransfer process multiple partitions in parallel.

1) Running the FastTransfer command from PowerShell

FastTransfer import command

2) FastTransfer completion output (elapsed time and summary)

FastTransfer import result

3) Verifying the number of imported rows in SQL Server

FastTransfer row count validation

On this machine, the transfer completed in 6.7 seconds, successfully loading ~15 million rows with 9 columns into SQL Server.

For more details on all available options (parallel methods, degrees, mapping, sources/targets), see the documentation: https://fasttransfer.arpe.io/docs/latest/

Conclusion

If you want the simplest and most traditional approach, BCP is still a solid tool and often "good enough".

But if you're importing very large CSV files and want a pipeline designed to go further especially by leveraging parallelism then FastTransfer is a compelling option.

Resources