Try to search your question here, if you can't find : Ask Any Question Now ?

How to improve the performance iterating over 130 items

HomeCategory: stackoverflowHow to improve the performance iterating over 130 items
Avatarcraig asked 5 months ago

I have to iterate over 130 Data Transfer Objects, and each time will generate a json to be uploaded to aws S3.

With no improvements, it takes around 90 seconds the complete the whole process. I tried using lamba and not using lamba, same results for both.

       for(AbstractDTO dto: dtos) {
            try {
                processDTO(dealerCode, yearPeriod, monthPeriod, dto);
            } catch (FileAlreadyExistsInS3Exception e) {
                failedToUploadDTOs.add(e.getLocalizedMessage() + ": " + dto.fileName() + ".json");
            }
        }
         dtos.stream().forEach(dto -> {
            try {
                processDTO(dealerCode, yearPeriod, monthPeriod, dto);
            } catch (FileAlreadyExistsInS3Exception e) {
                failedToUploadDTOs.add(e.getLocalizedMessage() + ": " + dto.fileName() + ".json");
            }
         });

After some investigation, I concluded that the method processDTO takes around 0.650ms per item to run.

My first attempt was to use parallel streams, and the results were pretty good, taking around 15 seconds to complete the whole process:

        dtos.parallelStream().forEach(dto -> {
            try {
                processDTO(dealerCode, yearPeriod, monthPeriod, dto);
            } catch (FileAlreadyExistsInS3Exception e) {
                failedToUploadDTOs.add(e.getLocalizedMessage() + ": " + dto.fileName() + ".json");
            }
        });

But I still need to decrease that time.
I researched about improving parallel streams, and discovered the ForkJoinPool trick:

        ForkJoinPool forkJoinPool = new ForkJoinPool(PARALLELISM_NUMBER);
        forkJoinPool.submit(() ->
        dtos.parallelStream().forEach(dto -> {
            try {
                processDTO(dealerCode, yearPeriod, monthPeriod, dto);
            } catch (FileAlreadyExistsInS3Exception e) {
                failedToUploadDTOs.add(e.getLocalizedMessage() + ": " + dto.fileName() + ".json");
            }
        })).get();
        forkJoinPool.shutdown();

Unfortunately, the results were a bit confusing for me.

  • When PARALLELISM_NUMBER is 8, it takes around 13 seconds to complete the whole process. Not a big improve.
  • When PARALLELISM_NUMBER is 16, it takes around 8 seconds to complete the whole process.
  • When PARALLELISM_NUMBER is 32, it takes around 5 seconds to complete the whole process.

All tests were done using postman requests, calling the controller method which will end-up iterating the 130 items

I’m satisfied with 5 seconds, using 32 as PARALLELISM_NUMBER, but I’m worried about the consequences.

  • Is it ok to keep 32?
  • What is the ideal PARALLELISM_NUMBER?
  • What do I have to keep in mind when deciding its value?

I’m running on a Mac 2.2GHZ I7

sysctl hw.physicalcpu hw.logicalcp
hw.physicalcpu: 4
hw.logicalcpu: 8

Here’s what processDTO does:

private void processDTO(int dealerCode, int yearPeriod, int monthPeriod, AbstractDTO dto)
            throws FileAlreadyExistsInS3Exception {
        String flatJson = JsonFlattener.flatten(new JSONObject(dto).toString());
        String jsonFileName = dto.fileName() + JSON_TYPE;;
        String jsonFilePath = buildFilePathNew(dto.endpoint(), dealerCode, yearPeriod, monthPeriod, AWS_S3_JSON_ROOT_FOLDER);
        uploadFileToS3(jsonFilePath + jsonFileName, flatJson);
}
public void uploadFileToS3(String fileName, String fileContent) throws FileAlreadyExistsInS3Exception {
        if (s3client.doesObjectExist(bucketName, fileName)) {
            throw new FileAlreadyExistsInS3Exception(ErrorMessages.FILE_ALREADY_EXISTS_IN_S3.getMessage());
        }
        s3client.putObject(bucketName, fileName, fileContent);
}
1 Answers
Best Answer
AvatarAmit answered 5 months ago
Your Answer

14 + 3 =

Popular Tags

WP Facebook Auto Publish Powered By : XYZScripts.com