# Investigating open() and close() performance

On another day, the question came up, in how far the performance of
open() affects a certain system. In my mind I was going through the
need to go to disk and do sync() calls to open a file that didn't
exist. Eventually I decided to write a simple tester for this use
case. open-tester.cpp will open ten thousand files.

In combination with Linux perf tool, this leaves us in a convenient
position to benchmark the performance of this particular system call.

The questions that I want to answer is, whether open() or close() will
be synchronous, i.e. if they will go to disk. And if they do, how
often they will do it.

  #include <fcntl.h>
#include <string.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/stat.h>

#include <cstdlib>

/* taken from https://stackoverflow.com/questions/440133/how-do-i-create-a-random-alpha-numeric-string-in-c */
void gen_random(char *s, const int len) {
static const char alphanum[] =
"0123456789"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";

for (int i = 0; i < len; ++i) {
s[i] = alphanum[rand() % (sizeof(alphanum) - 1)];
}

s[len] = 0;
}

int main(int argc, char **argv){
bool do_write;
bool do_close;

if(argc > 1){
if(strchr(argv[1], 'w') != NULL){
do_write = true;
}
if(strchr(argv[1], 'c') != NULL){
do_close = true;
}
}

const int num_opens = 1000;
char names[num_opens][11];
int fds[num_opens];

for(int i = 0; i < num_opens; i++){
gen_random(names[i], 10);
}

for(int i = 0; i < num_opens; i++){
fds[i] = open(names[i], O_CREAT | O_WRONLY | O_TRUNC);
if(do_write){
write(fds[i], "0123456789", 10);
}
}

if(do_close){
for(int i = 0; i < num_opens; i++){
if(fds[i] > 0){
close(fds[i]);
}
}
}

return 0;
}


I then ran this program with three different options.

# Just open the files

We're passing O_CREAT to open(), so this entails creating the file. I
wasn't sure if this would go directly to disk.

perf stat -e 'block:*' ../open-perf

|             |                            |       |         |                 |
| Performance | counter                    | stats | for     | '../open-perf': |
|             |                            |       |         |                 |
|       5,935 | block:block_touch_buffer   |       |         |                 |
|           0 | block:block_dirty_buffer   |       |         |                 |
|           0 | block:block_rq_abort       |       |         |                 |
|           0 | block:block_rq_requeue     |       |         |                 |
|           0 | block:block_rq_complete    |       |         |                 |
|           0 | block:block_rq_insert      |       |         |                 |
|           0 | block:block_rq_issue       |       |         |                 |
|           0 | block:block_bio_bounce     |       |         |                 |
|           0 | block:block_bio_complete   |       |         |                 |
|           0 | block:block_bio_backmerge  |       |         |                 |
|           0 | block:block_bio_frontmerge |       |         |                 |
|           0 | block:block_bio_queue      |       |         |                 |
|           0 | block:block_getrq          |       |         |                 |
|           0 | block:block_sleeprq        |       |         |                 |
|           0 | block:block_plug           |       |         |                 |
|           0 | block:block_unplug         |       |         |                 |
|           0 | block:block_split          |       |         |                 |
|           0 | block:block_bio_remap      |       |         |                 |
|           0 | block:block_rq_remap       |       |         |                 |
|             |                            |       |         |                 |
| 0.006021314 | seconds                    | time  | elapsed |                 |
|             |                            |       |         |                 |


# Open and close the files

The second test case will open 1000 files and then close them, given
O_CREAT, my assumption was, that this close() operation will go
directly to disk, persisting the file (name).

perf stat -e 'block:*' ../open-perf c

|             |                            |       |         |               |     |
| Performance | counter                    | stats | for     | '../open-perf | c': |
|             |                            |       |         |               |     |
|       5,935 | block:block_touch_buffer   |       |         |               |     |
|           0 | block:block_dirty_buffer   |       |         |               |     |
|           0 | block:block_rq_abort       |       |         |               |     |
|           0 | block:block_rq_requeue     |       |         |               |     |
|           0 | block:block_rq_complete    |       |         |               |     |
|           0 | block:block_rq_insert      |       |         |               |     |
|           0 | block:block_rq_issue       |       |         |               |     |
|           0 | block:block_bio_bounce     |       |         |               |     |
|           0 | block:block_bio_complete   |       |         |               |     |
|           0 | block:block_bio_backmerge  |       |         |               |     |
|           0 | block:block_bio_frontmerge |       |         |               |     |
|           0 | block:block_bio_queue      |       |         |               |     |
|           0 | block:block_getrq          |       |         |               |     |
|           0 | block:block_sleeprq        |       |         |               |     |
|           0 | block:block_plug           |       |         |               |     |
|           0 | block:block_unplug         |       |         |               |     |
|           0 | block:block_split          |       |         |               |     |
|           0 | block:block_bio_remap      |       |         |               |     |
|           0 | block:block_rq_remap       |       |         |               |     |
|             |                            |       |         |               |     |
| 0.006067496 | seconds                    | time  | elapsed |               |     |
|             |                            |       |         |               |     |


Contrary to my intuition, this did not lead to any block IO operations
being submitted to the disk. This simply seems to dirty the
corresponding inode the file is defined in. This is a very nice
outcome and this path looks well optimized.

The last test-case will open a file, write to it and then bulk close
all the files. This should lead to block io being performed, my
assumption here is again that close is synchronous.

perf stat -e 'block:*' ../open-perf cw

|             |                            |       |         |               |      |
| Performance | counter                    | stats | for     | '../open-perf | cw': |
|             |                            |       |         |               |      |
|      10,868 | block:block_touch_buffer   |       |         |               |      |
|       1,000 | block:block_dirty_buffer   |       |         |               |      |
|           0 | block:block_rq_abort       |       |         |               |      |
|          19 | block:block_rq_requeue     |       |         |               |      |
|           0 | block:block_rq_complete    |       |         |               |      |
|         176 | block:block_rq_insert      |       |         |               |      |
|         157 | block:block_rq_issue       |       |         |               |      |
|           0 | block:block_bio_bounce     |       |         |               |      |
|           0 | block:block_bio_complete   |       |         |               |      |
|         843 | block:block_bio_backmerge  |       |         |               |      |
|           0 | block:block_bio_frontmerge |       |         |               |      |
|       1,000 | block:block_bio_queue      |       |         |               |      |
|         157 | block:block_getrq          |       |         |               |      |
|           0 | block:block_sleeprq        |       |         |               |      |
|         157 | block:block_plug           |       |         |               |      |
|         157 | block:block_unplug         |       |         |               |      |
|           0 | block:block_split          |       |         |               |      |
|       1,000 | block:block_bio_remap      |       |         |               |      |
|           0 | block:block_rq_remap       |       |         |               |      |
|             |                            |       |         |               |      |
| 0.017545475 | seconds                    | time  | elapsed |               |      |
|             |                            |       |         |               |      |


And indeed it lead to a series of block IO requests. Note that this
example queues 1,000 block IO operations, but only fires 157 block IO
operations, a testament to the effectiveness of the IO scheduler.

From a speed perspective these operations are all negligible, when
performed on my notebook, even when writing to disk, this will only
issue 157 IO operations, which are easily completed in two hundredths
of a second.

In a followup I would like to look into continously writing random
files to disk.

So the takeway is, that essentially open() is not synchronous, it goes
directly to the dentry cache, close() does not incur any immediate
block device requests, iff no data has to be committed.