From chatGPT: CSV (Comma-Separated Values): CSV is a simple and widely supported file format that stores tabular data. It is human-readable and can be easily imported into various applications like spreadsheets and databases. However, CSV files can be large and may not be the most efficient option for very large datasets. JSON (JavaScript Object Notation): JSON is a lightweight and flexible format for representing structured data. It is commonly used for web APIs and data interchange. JSON files are human-readable, widely supported, and can handle complex nested data structures. However, they can also be large and may not be the most space-efficient format. Parquet: Parquet is a columnar storage file format designed for big data processing frameworks like Apache Hadoop and Apache Spark. It offers efficient compression and encoding, which can significantly reduce the file size and improve query performance, especially for analytics workloads. Parquet files are optimized for column-wise data access and are suitable for large datasets. Avro: Avro is a compact, efficient, and self-describing binary file format developed by Apache. It supports schema evolution, allowing you to add, remove, or modify fields without breaking compatibility with existing data. Avro files are suitable for large datasets and are commonly used in big data processing pipelines. HDF5 (Hierarchical Data Format): HDF5 is a versatile file format designed for storing and organizing large and complex datasets. It supports hierarchical data structures, compression, and efficient chunking for accessing subsets of data. HDF5 files are commonly used in scientific and numerical applications. Apache Arrow: Apache Arrow is an in-memory data format and columnar storage system that aims to provide a common data representation for different programming languages and systems. It is designed for high-performance analytics and can facilitate efficient data exchange between different frameworks and tools. I'm not sure what the best would be here - but Json I think is both more stable (feks CRLF, CR, special characters and so on) and easier to handle (by JsonConverters) as CSV? But I'm not sure about this bigdata-problem actually? Did you get many threads here about timeouts when handling many pages? Or is this a question about quota limitation? Florian
... View more