Golang高性能优化实战案例

目录

介绍 Golang 编码中,对于性能方面的调优方法


前言

Go benchmark 详解

接下来测试下列代码的性能

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
func Fib(n int) int {
	if n < 2 {
		return n
	}
	return Fib(n-1) + Fib(n-2)
}

func BenchmarkFib10(b *testing.B) {
	for i := 0; i < b.N; i++ {
		Fib(10)
	}
}

通过运行 go test -bench=. -benchmem 来统计代码占用内存信息

1
2
3
4
5
6
7
8
9
(base) ➜  benchmark git:(main) ✗ go test -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: go-performance-optimization/benchmark
cpu: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
BenchmarkFib10-4         4193748               291.2 ns/op             0 B/op          0 allocs/op
PASS
ok      go-performance-optimization/benchmark   2.111s
(base) ➜  benchmark git:(main)

对于运行结果第六行的参数解析:

  • BenchmarkFib10-4BenchmarkFib10 是测试函数名,-4 代表 GOMAXPROCS 的值为 4
  • 4193748:表示一共执行了 4193748,即 b.N 的值
  • 291.2 ns/op:每次执行花费 291.2ns
  • 0 B/op:每次执行申请的内存
  • 0 allocs/op:每次执行申请几次内存

尽可能在使用 make()初始化切片时提供容量信息

测试代码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func NoPreAlloc(size int) {
	data := make([]int, 0)
	for i := 0; i < size; i++ {
		data = append(data, i)
	}
}

func PreAlloc(size int) {
	data := make([]int, 0, size)
	for i := 0; i < size; i++ {
		data = append(data, i)
	}
}

func BenchmarkNoPreAlloc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		NoPreAlloc(1000)
	}
}

func BenchmarkPreAlloc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		PreAlloc(1000)
	}
}

运行结果:

1
2
3
4
5
6
7
8
9
(base) ➜  go-performance-optimization git:(main) ✗ go test ./slice -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: go-performance-optimization/slice
cpu: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
BenchmarkNoPreAlloc-4             217953              4963 ns/op           25208 B/op         12 allocs/op
BenchmarkPreAlloc-4               640620              1809 ns/op            8192 B/op          1 allocs/op
PASS
ok      go-performance-optimization/slice       2.890s

在已有切片的基础上创建切片,不会创建新的底层数组

场景:

  • 原切片较大,代码在原切片的基础上新建小切片
  • 原底层数组在内存中有引用,得不到释放

解决:使用 copy 替代 re-slice

代码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
func generateWithCap(n int) []int {
	r := rand.New(rand.NewSource(time.Now().UnixNano()))
	nums := make([]int, 0, n)
	for i := 0; i < n; i++ {
		nums = append(nums, r.Int())
	}
	return nums
}

func printMem(t *testing.T) {
	t.Helper()
	var rtm runtime.MemStats
	runtime.ReadMemStats(&rtm)
	t.Logf("%.2f MB", float64(rtm.Alloc)/1024./1024.)
}

func testLastChars(t *testing.T, f func([]int) []int) {
	t.Helper()
	ans := make([][]int, 0)
	for k := 0; k < 100; k++ {
		origin := generateWithCap(128 * 1024) // 1M
		ans = append(ans, f(origin))
	}
	printMem(t)
	_ = ans
}

func GetLastBySlice(origin []int) []int {
	return origin[len(origin)-2:]
}

func GetLastByCopy(origin []int) []int {
	ret := make([]int, 2)
	copy(ret, origin[len(origin)-2:])
	return ret
}

func TestLastCharsBySlice(t *testing.T) {
	testLastChars(t, GetLastBySlice)
}

func TestLastCharsByCopy(t *testing.T) {
	testLastChars(t, GetLastByCopy)
}

运行结果:

1
2
3
4
5
6
7
8
9
(base) ➜  go-performance-optimization git:(main) ✗ go test ./slice -run=^TestLastChars -v
=== RUN   TestLastCharsBySlice
    slice_test.go:74: 100.24 MB
--- PASS: TestLastCharsBySlice (0.15s)
=== RUN   TestLastCharsByCopy
    slice_test.go:78: 3.12 MB
--- PASS: TestLastCharsByCopy (0.15s)
PASS
ok      go-performance-optimization/slice       0.759s

结果差异非常明显,lastNumsBySlice 耗费了 100.24 MB 内存,也就是说,申请的 100 个 1 MB 大小的内存没有被回收。因为切片虽然只使用了最后 2 个元素,但是因为与原来 1M 的切片引用了相同的底层数组,底层数组得不到释放,因此,最终 100 MB 的内存始终得不到释放。而 lastNumsByCopy 仅消耗了 3.12 MB 的内存。这是因为,通过 copy,指向了一个新的底层数组,当 origin 不再被引用后,内存会被垃圾回收(garbage collector, GC)。

如果在循环里面显性的调用 runtime.GC(),效果更明显:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
func testLastChars(t *testing.T, f func([]int) []int) {
	t.Helper()
	ans := make([][]int, 0)
	for k := 0; k < 100; k++ {
		origin := generateWithCap(128 * 1024) // 1M
		ans = append(ans, f(origin))
		runtime.GC() // 显性垃圾回收
	}
	printMem(t)
	_ = ans
}

执行结果:

1
2
3
4
5
6
7
8
9
(base) ➜  go-performance-optimization git:(main) ✗ go test ./slice -run=^TestLastChars -v
=== RUN   TestLastCharsBySlice
    slice_test.go:75: 100.11 MB
--- PASS: TestLastCharsBySlice (0.14s)
=== RUN   TestLastCharsByCopy
    slice_test.go:79: 0.11 MB
--- PASS: TestLastCharsByCopy (0.09s)
PASS
ok      go-performance-optimization/slice       0.812s
  • 不断向 map 里添加元素的操作会触发 map 的扩容
  • 提前分配好空间可以减少内存拷贝以及 Rehash 的消耗
  • 根据实际需求提前预估好需要的空间

代码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func NoPreAlloc(size int) {
	data := make(map[int]int)
	for i := 0; i < size; i++ {
		data[i] = i
	}
}

func PreAlloc(size int) {
	data := make(map[int]int, size)
	for i := 0; i < size; i++ {
		data[i] = i
	}
}

func BenchmarkNoPreAlloc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		NoPreAlloc(1000)
	}
}

func BenchmarkPreAlloc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		PreAlloc(1000)
	}
}

运行结果:

1
2
3
4
5
6
7
8
9
(base) ➜  go-performance-optimization git:(main) ✗ go test ./map -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: go-performance-optimization/map
cpu: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
BenchmarkNoPreAlloc-4              16059             74349 ns/op           86551 B/op         64 allocs/op
BenchmarkPreAlloc-4                37770             29789 ns/op           41097 B/op          6 allocs/op
PASS
ok      go-performance-optimization/map 3.938s

常见的字符串拼接方式:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
func Plus(n int, str string) string {
	s := ""
	for i := 0; i < n; i++ {
		s += str
	}
	return s
}

func StrBuilder(n int, str string) string {
	var builder strings.Builder
	for i := 0; i < n; i++ {
		builder.WriteString(str)
	}
	return builder.String()
}

func ByteBuffer(n int, str string) string {
	buf := new(bytes.Buffer)
	for i := 0; i < n; i++ {
		buf.WriteString(str)
	}
	return buf.String()
}

func BenchmarkPlus(b *testing.B) {
	for i := 0; i < b.N; i++ {
		Plus(100, "abc")
	}
}

func BenchmarkStrBuilder(b *testing.B) {
	for i := 0; i < b.N; i++ {
		StrBuilder(100, "abc")
	}
}

func BenchmarkByteBuffer(b *testing.B) {
	for i := 0; i < b.N; i++ {
		ByteBuffer(100, "abc")
	}
}

运行结果:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(base) ➜  go-performance-optimization git:(main) ✗ go test ./string -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: go-performance-optimization/string
cpu: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
BenchmarkPlus-4                   181742              6054 ns/op           15992 B/op         99 allocs/op
BenchmarkStrBuilder-4            2173779               689.9 ns/op          1016 B/op          7 allocs/op
BenchmarkByteBuffer-4            1253618               915.8 ns/op          1280 B/op          5 allocs/op
PASS
ok      go-performance-optimization/string      5.977s

结论:使用+拼接性能最差,strings.Builder, bytes.Buffer 相近,strings.Buffer 更快

分析:

  • 字符串在 Go 中是不可变类型,所占内存大小是固定的
  • 每次+操作都会重新分配内存
  • strings.Builderbytes.Buffer 底层都是[]byte 数组
  • 通过 slice 扩容策略,不需要每次拼接都分配内存

先上源码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// strings.Builder
// String returns the accumulated string.
func (b *Builder) String() string {
	return unsafe.String(unsafe.SliceData(b.buf), len(b.buf))
}

// bytes.Buffer
// String returns the contents of the unread portion of the buffer
// as a string. If the [Buffer] is a nil pointer, it returns "<nil>".
//
// To build strings more efficiently, see the strings.Builder type.
func (b *Buffer) String() string {
	if b == nil {
		// Special case, useful in debugging.
		return "<nil>"
	}
	return string(b.buf[b.off:])
}

根据源码可知:

  • bytes.Buffer 转化为字符串时重新分配了一块空间
  • strings.Builder 直接将底层的 []byte 转换成了字符串
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
func PreStrBuilder(n int, str string) string {
	var builder strings.Builder
	builder.Grow(n * len(str))
	for i := 0; i < n; i++ {
		builder.WriteString(str)
	}
	return builder.String()
}

func PreByteBuffer(n int, str string) string {
	buf := new(bytes.Buffer)
	buf.Grow(n * len(str))
	for i := 0; i < n; i++ {
		buf.WriteString(str)
	}
	return buf.String()
}

func BenchmarkPreStrBuilder(b *testing.B) {
	for i := 0; i < b.N; i++ {
		PreStrBuilder(100, "abc")
	}
}

func BenchmarkPreByteBuffer(b *testing.B) {
	for i := 0; i < b.N; i++ {
		PreByteBuffer(100, "abc")
	}
}

运行结果:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(base) ➜  go-performance-optimization git:(main) ✗ go test ./string -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: go-performance-optimization/string
cpu: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
BenchmarkPlus-4                   191355              6443 ns/op           15992 B/op         99 allocs/op
BenchmarkStrBuilder-4            2216818               503.1 ns/op          1016 B/op          7 allocs/op
BenchmarkByteBuffer-4            1288650              1528 ns/op            1280 B/op          5 allocs/op
BenchmarkPreStrBuilder-4         2805319               468.2 ns/op           320 B/op          1 allocs/op
BenchmarkPreByteBuffer-4         1594519               877.2 ns/op           640 B/op          2 allocs/op
PASS
ok      go-performance-optimization/string      12.373s

根据上面的运行结果可知,bytes.Buffer 分配了两次内存

空结构体实例不占据任何内存空间,可作为各场景下的占位符使用,比如用 map 实现 set

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func EmptyStructMap(n int) {
	m := make(map[int]struct{})
	for i := 0; i < n; i++ {
		m[i] = struct{}{}
	}
}

func BoolMap(n int) {
	m := make(map[int]bool)
	for i := 0; i < n; i++ {
		m[i] = false
	}
}

func BenchmarkEmptyStructMap(b *testing.B) {
	for i := 0; i < b.N; i++ {
		EmptyStructMap(1000)
	}
}

func BenchmarkBoolMap(b *testing.B) {
	for i := 0; i < b.N; i++ {
		BoolMap(1000)
	}
}

运行结果:

1
2
3
4
5
6
7
8
9
(base) ➜  go-performance-optimization git:(main) ✗ go test ./struct -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: go-performance-optimization/struct
cpu: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
BenchmarkEmptyStructMap-4          17043             84522 ns/op           47735 B/op         65 allocs/op
BenchmarkBoolMap-4                 10000            130275 ns/op           53316 B/op         73 allocs/op
PASS
ok      go-performance-optimization/struct      4.088s
  • 锁的实现是通过操作系统来实现,属于系统调用
  • atomic 通过硬件实现,效率比锁高
  • sync.Mutex 应该用于保护一段逻辑,而非仅仅保护一个变量
  • 对于非数值操作,可以使用 atomic.Value,可以承载一个 interface{}

代码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
type atomicCounter struct {
	i int32
}

func AtomicAddOne(c *atomicCounter) {
	atomic.AddInt32(&c.i, 1)
}

type mutexCounter struct {
	i int32
	sync.Mutex
}

func MutexAddOne(c *mutexCounter) {
	c.Lock()
	c.i++
	c.Unlock()
}

func BenchmarkAtomicAddOne(b *testing.B) {
	for i := 0; i < b.N; i++ {
		c := new(atomicCounter)
		AtomicAddOne(c)
	}
}

func BenchmarkMutexAddOne(b *testing.B) {
	for i := 0; i < b.N; i++ {
		c := new(mutexCounter)
		MutexAddOne(c)
	}
}

运行结果:

1
2
3
4
5
6
7
8
9
(base) ➜  go-performance-optimization git:(main) ✗ go test ./atomic -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: go-performance-optimization/atomic
cpu: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
BenchmarkAtomicAddOne-4         69622866                17.44 ns/op            4 B/op          1 allocs/op
BenchmarkMutexAddOne-4          34958937                32.44 ns/op           16 B/op          1 allocs/op
PASS
ok      go-performance-optimization/atomic      3.967s
  • 避免常见的性能陷阱可以保证大部分程序的性能
  • 普通应用,不要一味地追求程序的性能
  • 越高深的性能优化手段越容易出现问题
  • 在满足正确、可靠、简洁、清晰等质量要求的前提下,提高程序性能