In today's episode we will deep dive into the influence factors on partitions and how we choose a good partition based on it. The behaviours include:
Max Partitions Bytes
Open Cost in Bytes
Num of cores
File size
Num of files
Feel free to comment or challenge my explanations as always. Happy to learn also myself more by the community.
Link to the code can be found here: github.com/datanikkthegreek/SparkDeltaDatabricksIn…
コメント