C 語言練習程式(9) -- bucket sort & radix sort -- 指標相關程式集錦(8)


Posted by nathan2009729 on 2022-09-12

bucket sort原理(字串版):
目標是把英文單詞依照第一個英文字的字母順序排列。bucket是籃子,因為ascii字符集只會有8個bit,所以要準備2的8次方個籃子,每個籃子會有一個值(int buckets[]),代表這個籃子所代表的字,它在output的第幾個開始。
舉個例子吧。假設現在有這些字串要排序:
"foo", "boo", "bar", "qoo", "qar", "baz", "qux", "qaz"
那麼經過一連串的運算後,我應該要得到buckets['q'] == 4,代表排序完以後q開頭的單詞會從鎮列第四個位置開始排。
bucket sort的pseudo code如下:

  1. 先算出buckets陣列的每一個值,也就是每一個字的開頭位置(void compute_buckets)
  2. 掃描input,根據buckets[某字]的值對照該input應該擺哪裡,每擺一個就對buckets[某字]+1。
    (void sort_strings)
    完整程式碼如下:
#include <stdio.h>

void compute_buckets(int n, char *array[n], int buckets[256])
{
  for (int i = 0; i < 256; i++) {
    buckets[i] = 0;
  }

  for (int i = 0; i < n; i++) {
    unsigned char bucket = (unsigned char)array[i][0];
    buckets[bucket]++;
  }

  int m = n;
  for (int i = 256 - 1; i >= 0; i--) {
    int count = buckets[i];
    buckets[i] = m - count;
    m -= count;
  }
}

void sort_strings(int n, char *input[n], char *output[n])
{
  int buckets[256];
  compute_buckets(n, input, buckets);
  for (int i = 0; i < n; i++) {
    unsigned char bucket = (unsigned char)input[i][0];
    int index = buckets[bucket]++;
    output[index] = input[i];
  }
}

int main(void)
{
  char *array[] = {
    "foo", "boo", "bar", "qoo", "qar", "baz", "qux", "qaz"
  };
  int n = sizeof array / sizeof *array;
  char *output[n];

  sort_strings(n, array, output);

  for (int i = 0; i < n; i++) {
    printf("%s\n", output[i]);
  }

  return 0;
}

接下來就可以開始講radix sort,這裡以3位數的排序來舉例。
假設需要排序的數列是: 7,121,8,35,16,44,那麼我們需要10個桶子代表0-9,對個位數、十位數、百位數進行3次的bucket sort。
第一回合如下(個位數):

輸出會是121,44,35,16,7,8(輸出方式是由左到右,由上而下),再拿這個output來進行第二回合:

這樣輸出會是7,8,16,121,35,44
再拿這個output當作input來進行第三回合,可以發現除了121以外的數字都照input順序由上到下排在0的桶子裡面,照由左到右,由上而下的輸出即可完成排序。
問題來了,要怎麼把前一回合輸出的陣列當作下一回合的輸入? 答案是下面這個函式:

void radix_sort(int n, int array[n])
{
  // It is *very* unlikely that sizeof an integer is odd, but if
  // it is, you need to move the results from helper
  // to array. I assume that we have an even number of bytes
  // because that is practically always true for int
  static_assert(sizeof *array % 2 == 0,
                "integer sizes must be powers of two");

  // Helper buffer; handle input/output switches
  // when bucket sorting
  int helper[n];
  // For switching between the buffers
  int *buffers[] = { array, helper };
  int bucket_input = 0;

  for (int offset = 0; offset < sizeof *array; offset++) {
    bucket_sort(n, offset,
                buffers[bucket_input],
                buffers[!bucket_input]);
    bucket_input = !bucket_input;
  }

}

重要!

這裡多出了一個helper陣列,更有趣的是互換機制,是立了一個bucket_input當作flag,每一回合要嘛0要嘛1,對應int *buffers[] = { array, helper };的第0個跟第1個,來當作陣input、output切換。

以範例程式來說,要排的是int,我們可以不用個十百千萬這樣照十進位位數做bucket sort,因為int一定是4個byte,所以每次比一個byte比四次,即可得出正確答案。而1個byte有8個bit,所以需要256大小的buckets陣列。

但是負數的問題要處理,處理方式是把數列分成正數塊跟負數塊,再分別比較。具體做法,是排序前要先整理數列,把負數集中在左邊,正數集中在右邊。做法是兩個函式scan_rightscan_left分別回傳最右邊的負數位置跟最左邊的正數位置,再利用swap函數做交換。如此持續直到scan_right的回傳值大於scan_left為止。

int *scan_right(int *left, int *right)
{
  while (left < right) {
    if (*left >= 0) break;
    left++;
  }
  return left;
}

// Both left and right must point to legal addresses
int *scan_left(int *left, int *right)
{
  while (left < right) {
    if (*right < 0) break;
    right--;
  }
  return right;
}

// Both left and right must point to legal addresses
void swap(int *left, int *right)
{
  int i = *left;
  *left = *right;
  *right = i;
}

int split(int n, int array[n])
{
  int *left = array, *right = array + n - 1;
  while (left < right) {
    left = scan_right(left, right);
    right = scan_left(left, right);
    swap(left, right);
  }
  return left - array;
}

而正負數混合數列的排序,其程式碼如下:

void sort_int(int n, int array[n])
{
  if (n <= 0) return;
  int m = split(n, array);
  if (m > 0) {
    radix_sort(m, array);
    reverse(m, array);
  }
  if (m < n) {
    radix_sort(n - m, array + m);
  }
}

完整程式碼:

#include <stdio.h>
#include <assert.h>

void bucket_sort(int n, int offset,
                 int const input[n], int output[n])
{
  int buckets[256];
  for (int i = 0; i < 256; i++) {
    buckets[i] = 0;
  }

  for (int i = 0; i < n; i++) {
    unsigned char bucket = (input[i] >> 8 * offset) & 0xff;
    buckets[bucket]++;
  }

  int m = n;
  for (int i = 256 - 1; i >= 0; i--) {
    int count = buckets[i];
    buckets[i] = m - count;
    m -= count;
  }

  for (int i = 0; i < n; i++) {
    unsigned char bucket = (input[i] >> 8 * offset) & 0xff;
    int index = buckets[bucket]++;
    output[index] = input[i];
  }
}

void radix_sort(int n, int array[n])
{
  // It is *very* unlikely that sizeof an integer is odd, but if
  // it is, you need to move the results from helper
  // to array. I assume that we have an even number of bytes
  // because that is practically always true for int
  static_assert(sizeof *array % 2 == 0,
                "integer sizes must be powers of two");

  // Helper buffer; handle input/output switches
  // when bucket sorting
  int helper[n];
  // For switching between the buffers
  int *buffers[] = { array, helper };
  int bucket_input = 0;

  for (int offset = 0; offset < sizeof *array; offset++) {
    bucket_sort(n, offset,
                buffers[bucket_input],
                buffers[!bucket_input]);
    bucket_input = !bucket_input;
  }

}

// Both left and right must point to legal addresses
int *scan_right(int *left, int *right)
{
  while (left < right) {
    if (*left >= 0) break;
    left++;
  }
  return left;
}

// Both left and right must point to legal addresses
int *scan_left(int *left, int *right)
{
  while (left < right) {
    if (*right < 0) break;
    right--;
  }
  return right;
}

// Both left and right must point to legal addresses
void swap(int *left, int *right)
{
  int i = *left;
  *left = *right;
  *right = i;
}

int split(int n, int array[n])
{
  int *left = array, *right = array + n - 1;
  while (left < right) {
    left = scan_right(left, right);
    right = scan_left(left, right);
    swap(left, right);
  }
  return left - array;
}

void reverse(int n, int array[n])
{
  int *left = array, *right = array + n - 1;
  while (left < right) {
    swap(left++, right--);
  }
}

void sort_int(int n, int array[n])
{
  if (n <= 0) return;
  int m = split(n, array);
  if (m > 0) {
    radix_sort(m, array);
    reverse(m, array);
  }
  if (m < n) {
    radix_sort(n - m, array + m);
  }
}

int main(void)
{
  int array[] = { -1, -2, 13, 12, 4, 4200, 13, 6, 14, -3, 42, 13 };
  int n = sizeof array / sizeof *array;

  radix_sort(n, array);
  for (int i = 0; i < n; i++) {
    printf("%d ", array[i]);
  }
  printf("\n");

  sort_int(n, array);
  for (int i = 0; i < n; i++) {
    printf("%d ", array[i]);
  }
  printf("\n");

  return 0;
}

#bucket sort #radix sort







Related Posts

Scrapy 和 Redis 分散式crawlers (蘋果日報為例)

Scrapy 和 Redis 分散式crawlers (蘋果日報為例)

報告注意事項

報告注意事項

[Week2] - JavaScript :邏輯 & 位元運算子

[Week2] - JavaScript :邏輯 & 位元運算子


Comments