-
Notifications
You must be signed in to change notification settings - Fork 70
Description
What happens?
When increasing thread count -- my breaking example is 64 on a 10 core machine -- seeing the following error for an S3 reading of parquet files:
IOException: IO Error: Could not resolve hostname error for HTTP GET
This is for duckdb version 1.5.1.
Threads counts of ~8 works always, thread count of 32 fails eventually (but takes longer).
NOTE: this formerly worked just fine with version <=1.4. We had tested up 128-256 threads even, where 64 was a nice performance sweet spot for this particular operation.
To Reproduce
To reproduce, I'm setting AWS env vars in my environment (but also saw this with SSO credential chain).
This works ✅ :
import duckdb
conn = duckdb.connect()
conn.execute("install httpfs; load httpfs;")
print(conn.query("""
select count(*) from read_parquet(
's3://<bucket>/<path>/**/*.parquet',
hive_partitioning=true,
filename=true
)
limit 3;
"""))This throws an error 🚫 :
import duckdb
conn = duckdb.connect()
conn.execute("install httpfs; load httpfs;")
conn.execute("SET threads = 64;") #<--------------------
print(conn.query("""
select count(*) from read_parquet(
's3://<bucket>/<path>/**/*.parquet',
hive_partitioning=true,
filename=true
)
limit 3;
"""))Error:
---------------------------------------------------------------------------
IOException Traceback (most recent call last)
Cell In[3], line 8
4 conn.execute("install httpfs; load httpfs;")
6 conn.execute("SET threads = 64;")
----> 8 print(conn.query("""
9 select count(*) from read_parquet(
10 's3://<bucket>/<path>/**/*.parquet',
11 hive_partitioning=true,
12 filename=true
13 )
14 limit 3;
15 """))
IOException: IO Error: Could not resolve hostname error for HTTP GET to 'https://<bucket>.s3.us-east-1.amazonaws.com/<path>/year%3D2025/month%3D02/day%3D03/3fbf80ad-afad-40a4-b3bb-0cbe7fba6076-0.parquet'
OS:
OSX
DuckDB Package Version:
1.5.1
Python Version:
3.12.6
Full Name:
Graham Hukill
Affiliation:
MIT Libraries
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot easily share my data sets due to their large size
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have