Array-object vs Array-value: A Performance analysis for High-Volume Data

When designing high-performance data systems, the choice of data structure is a critical factor that directly impacts both memory footprint and network throughput. Two common patterns for representing collections of data are Array-Object (a list of objects, each with named keys) and Array-Value (a list of arrays, where each entry represents a row of raw values).

While the Array-Object pattern is often preferred for its readability and self-documenting nature, it frequently introduces significant overhead due to the repetition of keys in every single object. Conversely, the Array-Value format—by stripping away these repeated keys—offers a significantly more compact structure, particularly when dealing with large datasets.

For the context, this is example array-object and array-value:

// Array object
[
  {"id": 1, "name": "Alice", "role": "Admin"},
  {"id": 2, "name": "Bob", "role": "User"},
  {"id": 3, "name": "Charlie", "role": "User"}
]

// Array Value
{
  "headers": ["id", "name", "role"],
  "data": [
    [1, "Alice", "Admin"],
    [2, "Bob", "User"],
    [3, "Charlie", "User"]
  ]
}

I think theoritically array-value is smaller compared to array-object, but how much?. I will test generate both data and compare to see how much reduction we got.

  1. Methodology: We evaluated data size based on the number of rows and columns. For the scope of this analysis, we excluded nested objects.
  2. Data generation: using faker data to make it more random, avoiding repeated value.
  3. Each data comparison is based on its byte consumption or size.
  4. Tools I use: python, Faker, and some libs

Comparison Reduction

To evaluate the effectiveness of each variant, I generated sample datasets ranging from 5 to 17 columns and containing 10 to 1,000,000 rows. Here is the result of reduction array-value to array-object.

  • Average reduction array-value to array-object is 48%, 53%, and 54%.
  • Minimum reduction percentage is 43%.
  • The larger the number of rows and columns, the more effective the reduction is.
  • But, a data size of 1,000 rows data is sweet spot for reduction, and then slight drop up to 1 mil data.
  • Higher number of column are more effective which is there is big gap between 5 cols and 10/17 cols.

Based of this analysis, array-value is more efficient, because it cuts the data size by 40%-50% versus array-object. However, this raises the question of how compression impacts these two formats? and how much reduction for both method?

Reduction with Compression Method

Further analysis compares the compression efficiency of gzip when applied to both array-object and array-value structures. Since compression algorithms primarily target repetitive data, this comparison evaluates how each structure minimizes redundancy.

  • Average reduction to compressed data is 8-10%
  • Minimum reduction percentage is 1.3% for 10 row data and 5 columns.

As we can see, compression method nearly identical file sizes for both datasets, achieving a reduction of less than 10%. however, increasing the number of columns improved the reduction rate to a maxium of 12%.

Additionally, because array-value structures are smaller than array-objects, the uncompressed data achieves a 43% reduction, with compression providing a further 1.3–3.5% improvement. In summary, the array-value structure is consistently more space-efficient than the array-object format.

Conclusion

Furthermore, the array-value structure is more effective for datasets with a higher number of columns, showing a 10 point improvement compared to datasets with fewer columns.

The data shows that array-values are better for sending large amounts of information. Even so, in practice you don’t need to rush to change your current array-object, as compression might be enough to shrink them. Still, if you have massive datasets that have many columns, it is definetly worth switching to array-value.

Add a comment

Subscribe now!