compare

command module
v0.0.0-...-49a881d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 20, 2022 License: GPL-3.0 Imports: 17 Imported by: 0

README

Comparison of sorting algorithms in Golang

dot.png

When we have any computer tool, we want to know how fast it is. This is a normal life situation. In the case of Golang, we have a concurrency scheme. What is it? Will it be efficient, for example, to sort digits? Concurrency and parallelism are not the same thing, but what can we have as a result of this? If we build a sorting program by concurrency scheme, what effect can we expect?

This experiment explores the following algorithms:

  • sort.Ints() - standard sorting of integers.

  • RadixSort() - this is not quite a sort in the sense that there is no function for exchanging values, however, RadixSort makes it possible understand how fast you can "sort" integers.

  • QuickSort() - implementation of quick sort.

  • QuickSort_parallel() - implementation of a modified quicksort algorithm, where goroutines are naturally embedded where there is a recurrent call of the computational function.

  • TimSort() - implementation of Tim Peters's mergesort sorting algorithm.

Results

The experiment for synthetic data.

By synthetic data, we understand the data generated using the random number generator according to the uniform distribution law.

Figure_1.png

  • The parallel version of quicksort - QuickSort_parallel() implemented on goroutines demonstrates the best result among exchange sorts. If exchange sort is not required, then your choice is RadixSort().

The experiment for "natural" data.

By "natural" data we understand data taken from the real world, which are brought together into an integral system and generated by a certain logic of human use. The main feature of natural data is the empirical fact that the data is not extremely mixed, as is the case with synthetic data, and there are almost sorted ranges or ranges with close numbers in the arithmetic sense in the data. In this particular case, the data is a sequence of integers as a result of the compact storage of the video data stream.

  • You can see, that the results are very strongly dependent on the length of the vectors. This result looks like an error.

Figure_2_1.png

To check, there is sorting another array. There are more repeating fragments in this array, which, in fact, do not need to be sorted.

Figure_3.png

  • The big surprise for me was the empirical fact that QuickSort() turned out to be far from fast.

Figure_2_2.png

  • TimSort() for vectors of dimension 10^8 did not show much advantage in comparison tests for data of different nature.
  • Standard sorting sort.Ints() worked better.
  • RadixSort() also performed better.

Figure_2_3.png

  • QuickSort() is starting to lose.

Figure_2_4.png

  • However, for sorting problems up to a vector length of 50000, QuickSort() can be a good choice.

Figure_2_5.png

  • For sorting problems up to a vector length of 10000, QuickSort() is the best choice among exchange algorithms.
  • Up to length 9000, the cost of the algorithm for organizing the parallelization procedure is higher than the gain in time as a result of the parallelization itself.

What is the general conclusion?

You should carefully study the nature of the data and select a sorting algorithm for a specific sorting range to maximize the effect of the sorting algorithm, that is, to minimaze the time of run. I think that RAM has to be spare much less often. These are production challenges.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL