1curl internals
2==============
3
4 - [Intro](#intro)
5 - [git](#git)
6 - [Portability](#Portability)
7 - [Windows vs Unix](#winvsunix)
8 - [Library](#Library)
9   - [`Curl_connect`](#Curl_connect)
10   - [`multi_do`](#multi_do)
11   - [`Curl_readwrite`](#Curl_readwrite)
12   - [`multi_done`](#multi_done)
13   - [`Curl_disconnect`](#Curl_disconnect)
14 - [HTTP(S)](#http)
15 - [FTP](#ftp)
16 - [Kerberos](#kerberos)
17 - [TELNET](#telnet)
18 - [FILE](#file)
19 - [SMB](#smb)
20 - [LDAP](#ldap)
21 - [E-mail](#email)
22 - [General](#general)
23 - [Persistent Connections](#persistent)
24 - [multi interface/non-blocking](#multi)
25 - [SSL libraries](#ssl)
26 - [Library Symbols](#symbols)
27 - [Return Codes and Informationals](#returncodes)
28 - [AP/ABI](#abi)
29 - [Client](#client)
30 - [Memory Debugging](#memorydebug)
31 - [Test Suite](#test)
32 - [Asynchronous name resolves](#asyncdns)
33   - [c-ares](#cares)
34 - [`curl_off_t`](#curl_off_t)
35 - [curlx](#curlx)
36 - [Content Encoding](#contentencoding)
37 - [`hostip.c` explained](#hostip)
38 - [Track Down Memory Leaks](#memoryleak)
39 - [`multi_socket`](#multi_socket)
40 - [Structs in libcurl](#structs)
41   - [Curl_easy](#Curl_easy)
42   - [connectdata](#connectdata)
43   - [Curl_multi](#Curl_multi)
44   - [Curl_handler](#Curl_handler)
45   - [conncache](#conncache)
46   - [Curl_share](#Curl_share)
47   - [CookieInfo](#CookieInfo)
48
49<a name="intro"></a>
50Intro
51=====
52
53 This project is split in two. The library and the client. The client part
54 uses the library, but the library is designed to allow other applications to
55 use it.
56
57 The largest amount of code and complexity is in the library part.
58
59
60<a name="git"></a>
61git
62===
63
64 All changes to the sources are committed to the git repository as soon as
65 they're somewhat verified to work. Changes shall be committed as independently
66 as possible so that individual changes can be easily spotted and tracked
67 afterwards.
68
69 Tagging shall be used extensively, and by the time we release new archives we
70 should tag the sources with a name similar to the released version number.
71
72<a name="Portability"></a>
73Portability
74===========
75
76 We write curl and libcurl to compile with C89 compilers.  On 32-bit and up
77 machines. Most of libcurl assumes more or less POSIX compliance but that's
78 not a requirement.
79
80 We write libcurl to build and work with lots of third party tools, and we
81 want it to remain functional and buildable with these and later versions
82 (older versions may still work but is not what we work hard to maintain):
83
84Dependencies
85------------
86
87 - OpenSSL      0.9.7
88 - GnuTLS       3.1.10
89 - zlib         1.1.4
90 - libssh2      0.16
91 - c-ares       1.6.0
92 - libidn2      2.0.0
93 - wolfSSL      2.0.0
94 - openldap     2.0
95 - MIT Kerberos 1.2.4
96 - GSKit        V5R3M0
97 - NSS          3.14.x
98 - Heimdal      ?
99 - nghttp2      1.12.0
100 - WinSock      2.2 (on Windows 95+ and Windows CE .NET 4.1+)
101
102Operating Systems
103-----------------
104
105 On systems where configure runs, we aim at working on them all - if they have
106 a suitable C compiler. On systems that don't run configure, we strive to keep
107 curl running correctly on:
108
109 - Windows      98
110 - AS/400       V5R3M0
111 - Symbian      9.1
112 - Windows CE   ?
113 - TPF          ?
114
115Build tools
116-----------
117
118 When writing code (mostly for generating stuff included in release tarballs)
119 we use a few "build tools" and we make sure that we remain functional with
120 these versions:
121
122 - GNU Libtool  1.4.2
123 - GNU Autoconf 2.57
124 - GNU Automake 1.7
125 - GNU M4       1.4
126 - perl         5.004
127 - roffit       0.5
128 - groff        ? (any version that supports `groff -Tps -man [in] [out]`)
129 - ps2pdf (gs)  ?
130
131<a name="winvsunix"></a>
132Windows vs Unix
133===============
134
135 There are a few differences in how to program curl the Unix way compared to
136 the Windows way. Perhaps the four most notable details are:
137
138 1. Different function names for socket operations.
139
140   In curl, this is solved with defines and macros, so that the source looks
141   the same in all places except for the header file that defines them. The
142   macros in use are `sclose()`, `sread()` and `swrite()`.
143
144 2. Windows requires a couple of init calls for the socket stuff.
145
146   That's taken care of by the `curl_global_init()` call, but if other libs
147   also do it etc there might be reasons for applications to alter that
148   behaviour.
149
150   We require WinSock version 2.2 and load this version during global init.
151
152 3. The file descriptors for network communication and file operations are
153    not as easily interchangeable as in Unix.
154
155   We avoid this by not trying any funny tricks on file descriptors.
156
157 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
158    destroying binary data, although you do want that conversion if it is
159    text coming through... (sigh)
160
161   We set stdout to binary under windows
162
163 Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
164 conditionals that deal with features *should* instead be in the format
165 `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
166 we maintain a `curl_config-win32.h` file in lib directory that is supposed to
167 look exactly like a `curl_config.h` file would have looked like on a Windows
168 machine!
169
170 Generally speaking: always remember that this will be compiled on dozens of
171 operating systems. Don't walk on the edge!
172
173<a name="Library"></a>
174Library
175=======
176
177 (See [Structs in libcurl](#structs) for the separate section describing all
178 major internal structs and their purposes.)
179
180 There are plenty of entry points to the library, namely each publicly defined
181 function that libcurl offers to applications. All of those functions are
182 rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
183 put in the `lib/easy.c` file.
184
185 `curl_global_init()` and `curl_global_cleanup()` should be called by the
186 application to initialize and clean up global stuff in the library. As of
187 today, it can handle the global SSL initing if SSL is enabled and it can init
188 the socket layer on windows machines. libcurl itself has no "global" scope.
189
190 All printf()-style functions use the supplied clones in `lib/mprintf.c`. This
191 makes sure we stay absolutely platform independent.
192
193 [ `curl_easy_init()`][2] allocates an internal struct and makes some
194 initializations.  The returned handle does not reveal internals. This is the
195 `Curl_easy` struct which works as an "anchor" struct for all `curl_easy`
196 functions. All connections performed will get connect-specific data allocated
197 that should be used for things related to particular connections/requests.
198
199 [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
200 be passed in pairs: the parameter-ID and the parameter-value. The list of
201 options is documented in the man page. This function mainly sets things in
202 the `Curl_easy` struct.
203
204 `curl_easy_perform()` is just a wrapper function that makes use of the multi
205 API.  It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
206 `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
207 and then returns.
208
209 Some of the most important key functions in `url.c` are called from
210 `multi.c` when certain key steps are to be made in the transfer operation.
211
212<a name="Curl_connect"></a>
213Curl_connect()
214--------------
215
216   Analyzes the URL, it separates the different components and connects to the
217   remote host. This may involve using a proxy and/or using SSL. The
218   `Curl_resolv()` function in `lib/hostip.c` is used for looking up host
219   names (it does then use the proper underlying method, which may vary
220   between platforms and builds).
221
222   When `Curl_connect` is done, we are connected to the remote site. Then it
223   is time to tell the server to get a document/file. `Curl_do()` arranges
224   this.
225
226   This function makes sure there's an allocated and initiated `connectdata`
227   struct that is used for this particular connection only (although there may
228   be several requests performed on the same connect). A bunch of things are
229   inited/inherited from the `Curl_easy` struct.
230
231<a name="multi_do"></a>
232multi_do()
233---------
234
235   `multi_do()` makes sure the proper protocol-specific function is called.
236   The functions are named after the protocols they handle.
237
238   The protocol-specific functions of course deal with protocol-specific
239   negotiations and setup. They have access to the `Curl_sendf()` (from
240   `lib/sendf.c`) function to send printf-style formatted data to the remote
241   host and when they're ready to make the actual file transfer they call the
242   `Curl_setup_transfer()` function (in `lib/transfer.c`) to setup the
243   transfer and returns.
244
245   If this DO function fails and the connection is being re-used, libcurl will
246   then close this connection, setup a new connection and re-issue the DO
247   request on that. This is because there is no way to be perfectly sure that
248   we have discovered a dead connection before the DO function and thus we
249   might wrongly be re-using a connection that was closed by the remote peer.
250
251<a name="Curl_readwrite"></a>
252Curl_readwrite()
253----------------
254
255   Called during the transfer of the actual protocol payload.
256
257   During transfer, the progress functions in `lib/progress.c` are called at
258   frequent intervals (or at the user's choice, a specified callback might get
259   called). The speedcheck functions in `lib/speedcheck.c` are also used to
260   verify that the transfer is as fast as required.
261
262<a name="multi_done"></a>
263multi_done()
264-----------
265
266   Called after a transfer is done. This function takes care of everything
267   that has to be done after a transfer. This function attempts to leave
268   matters in a state so that `multi_do()` should be possible to call again on
269   the same connection (in a persistent connection case). It might also soon
270   be closed with `Curl_disconnect()`.
271
272<a name="Curl_disconnect"></a>
273Curl_disconnect()
274-----------------
275
276   When doing normal connections and transfers, no one ever tries to close any
277   connections so this is not normally called when `curl_easy_perform()` is
278   used. This function is only used when we are certain that no more transfers
279   are going to be made on the connection. It can be also closed by force, or
280   it can be called to make sure that libcurl doesn't keep too many
281   connections alive at the same time.
282
283   This function cleans up all resources that are associated with a single
284   connection.
285
286<a name="http"></a>
287HTTP(S)
288=======
289
290 HTTP offers a lot and is the protocol in curl that uses the most lines of
291 code. There is a special file `lib/formdata.c` that offers all the
292 multipart post functions.
293
294 base64-functions for user+password stuff (and more) is in `lib/base64.c`
295 and all functions for parsing and sending cookies are found in
296 `lib/cookie.c`.
297
298 HTTPS uses in almost every case the same procedure as HTTP, with only two
299 exceptions: the connect procedure is different and the function used to read
300 or write from the socket is different, although the latter fact is hidden in
301 the source by the use of `Curl_read()` for reading and `Curl_write()` for
302 writing data to the remote server.
303
304 `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
305 encoding.
306
307 An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
308 series of functions we use. They append data to one single buffer, and when
309 the building is finished the entire request is sent off in one single write.
310 This is done this way to overcome problems with flawed firewalls and lame
311 servers.
312
313<a name="ftp"></a>
314FTP
315===
316
317 The `Curl_if2ip()` function can be used for getting the IP number of a
318 specified network interface, and it resides in `lib/if2ip.c`.
319
320 `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
321 was made a separate function to prevent us programmers from forgetting that
322 they must be CRLF terminated. They must also be sent in one single `write()`
323 to make firewalls and similar happy.
324
325<a name="kerberos"></a>
326Kerberos
327========
328
329 Kerberos support is mainly in `lib/krb5.c` but also `curl_sasl_sspi.c` and
330 `curl_sasl_gssapi.c` for the email protocols and `socks_gssapi.c` and
331 `socks_sspi.c` for SOCKS5 proxy specifics.
332
333<a name="telnet"></a>
334TELNET
335======
336
337 Telnet is implemented in `lib/telnet.c`.
338
339<a name="file"></a>
340FILE
341====
342
343 The `file://` protocol is dealt with in `lib/file.c`.
344
345<a name="smb"></a>
346SMB
347===
348
349 The `smb://` protocol is dealt with in `lib/smb.c`.
350
351<a name="ldap"></a>
352LDAP
353====
354
355 Everything LDAP is in `lib/ldap.c` and `lib/openldap.c`.
356
357<a name="email"></a>
358E-mail
359======
360
361 The e-mail related source code is in `lib/imap.c`, `lib/pop3.c` and
362 `lib/smtp.c`.
363
364<a name="general"></a>
365General
366=======
367
368 URL encoding and decoding, called escaping and unescaping in the source code,
369 is found in `lib/escape.c`.
370
371 While transferring data in `Transfer()` a few functions might get used.
372 `curl_getdate()` in `lib/parsedate.c` is for HTTP date comparisons (and
373 more).
374
375 `lib/getenv.c` offers `curl_getenv()` which is for reading environment
376 variables in a neat platform independent way. That's used in the client, but
377 also in `lib/url.c` when checking the proxy environment variables. Note that
378 contrary to the normal unix `getenv()`, this returns an allocated buffer that
379 must be `free()`ed after use.
380
381 `lib/netrc.c` holds the `.netrc` parser.
382
383 `lib/timeval.c` features replacement functions for systems that don't have
384 `gettimeofday()` and a few support functions for timeval conversions.
385
386 A function named `curl_version()` that returns the full curl version string
387 is found in `lib/version.c`.
388
389<a name="persistent"></a>
390Persistent Connections
391======================
392
393 The persistent connection support in libcurl requires some considerations on
394 how to do things inside of the library.
395
396 - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call
397   must never hold connection-oriented data. It is meant to hold the root data
398   as well as all the options etc that the library-user may choose.
399
400 - The `Curl_easy` struct holds the "connection cache" (an array of
401   pointers to `connectdata` structs).
402
403 - This enables the 'curl handle' to be reused on subsequent transfers.
404
405 - When libcurl is told to perform a transfer, it first checks for an already
406   existing connection in the cache that we can use. Otherwise it creates a
407   new one and adds that to the cache. If the cache is full already when a new
408   connection is added, it will first close the oldest unused one.
409
410 - When the transfer operation is complete, the connection is left
411   open. Particular options may tell libcurl not to, and protocols may signal
412   closure on connections and then they won't be kept open, of course.
413
414 - When `curl_easy_cleanup()` is called, we close all still opened connections,
415   unless of course the multi interface "owns" the connections.
416
417 The curl handle must be re-used in order for the persistent connections to
418 work.
419
420<a name="multi"></a>
421multi interface/non-blocking
422============================
423
424 The multi interface is a non-blocking interface to the library. To make that
425 interface work as well as possible, no low-level functions within libcurl
426 must be written to work in a blocking manner. (There are still a few spots
427 violating this rule.)
428
429 One of the primary reasons we introduced c-ares support was to allow the name
430 resolve phase to be perfectly non-blocking as well.
431
432 The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
433 the code to allow non-blocking operations even on multi-stage command-
434 response protocols. They are built around state machines that return when
435 they would otherwise block waiting for data.  The DICT, LDAP and TELNET
436 protocols are crappy examples and they are subject for rewrite in the future
437 to better fit the libcurl protocol family.
438
439<a name="ssl"></a>
440SSL libraries
441=============
442
443 Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
444 extended to its successor OpenSSL but has since also been extended to several
445 other SSL/TLS libraries and we expect and hope to further extend the support
446 in future libcurl versions.
447
448 To deal with this internally in the best way possible, we have a generic SSL
449 function API as provided by the `vtls/vtls.[ch]` system, and they are the only
450 SSL functions we must use from within libcurl. vtls is then crafted to use
451 the appropriate lower-level function calls to whatever SSL library that is in
452 use. For example `vtls/openssl.[ch]` for the OpenSSL library.
453
454<a name="symbols"></a>
455Library Symbols
456===============
457
458 All symbols used internally in libcurl must use a `Curl_` prefix if they're
459 used in more than a single file. Single-file symbols must be made static.
460 Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
461 but they are to be changed to follow this pattern in future versions.) Public
462 API functions are marked with `CURL_EXTERN` in the public header files so
463 that all others can be hidden on platforms where this is possible.
464
465<a name="returncodes"></a>
466Return Codes and Informationals
467===============================
468
469 I've made things simple. Almost every function in libcurl returns a CURLcode,
470 that must be `CURLE_OK` if everything is OK or otherwise a suitable error
471 code as the `curl/curl.h` include file defines. The very spot that detects an
472 error must use the `Curl_failf()` function to set the human-readable error
473 description.
474
475 In aiding the user to understand what's happening and to debug curl usage, we
476 must supply a fair number of informational messages by using the
477 `Curl_infof()` function. Those messages are only displayed when the user
478 explicitly asks for them. They are best used when revealing information that
479 isn't otherwise obvious.
480
481<a name="abi"></a>
482API/ABI
483=======
484
485 We make an effort to not export or show internals or how internals work, as
486 that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
487 for our promise to users.
488
489<a name="client"></a>
490Client
491======
492
493 `main()` resides in `src/tool_main.c`.
494
495 `src/tool_hugehelp.c` is automatically generated by the `mkhelp.pl` perl
496 script to display the complete "manual" and the `src/tool_urlglob.c` file
497 holds the functions used for the URL-"globbing" support. Globbing in the
498 sense that the `{}` and `[]` expansion stuff is there.
499
500 The client mostly sets up its `config` struct properly, then
501 it calls the `curl_easy_*()` functions of the library and when it gets back
502 control after the `curl_easy_perform()` it cleans up the library, checks
503 status and exits.
504
505 When the operation is done, the `ourWriteOut()` function in `src/writeout.c`
506 may be called to report about the operation. That function is mostly using the
507 `curl_easy_getinfo()` function to extract useful information from the curl
508 session.
509
510 It may loop and do all this several times if many URLs were specified on the
511 command line or config file.
512
513<a name="memorydebug"></a>
514Memory Debugging
515================
516
517 The file `lib/memdebug.c` contains debug-versions of a few functions.
518 Functions such as `malloc()`, `free()`, `fopen()`, `fclose()`, etc that
519 somehow deal with resources that might give us problems if we "leak" them.
520 The functions in the memdebug system do nothing fancy, they do their normal
521 function and then log information about what they just did. The logged data
522 can then be analyzed after a complete session,
523
524 `memanalyze.pl` is the perl script present in `tests/` that analyzes a log
525 file generated by the memory tracking system. It detects if resources are
526 allocated but never freed and other kinds of errors related to resource
527 management.
528
529 Internally, definition of preprocessor symbol `DEBUGBUILD` restricts code
530 which is only compiled for debug enabled builds. And symbol `CURLDEBUG` is
531 used to differentiate code which is _only_ used for memory
532 tracking/debugging.
533
534 Use `-DCURLDEBUG` when compiling to enable memory debugging, this is also
535 switched on by running configure with `--enable-curldebug`. Use
536 `-DDEBUGBUILD` when compiling to enable a debug build or run configure with
537 `--enable-debug`.
538
539 `curl --version` will list 'Debug' feature for debug enabled builds, and
540 will list 'TrackMemory' feature for curl debug memory tracking capable
541 builds. These features are independent and can be controlled when running
542 the configure script. When `--enable-debug` is given both features will be
543 enabled, unless some restriction prevents memory tracking from being used.
544
545<a name="test"></a>
546Test Suite
547==========
548
549 The test suite is placed in its own subdirectory directly off the root in the
550 curl archive tree, and it contains a bunch of scripts and a lot of test case
551 data.
552
553 The main test script is `runtests.pl` that will invoke test servers like
554 `httpserver.pl` and `ftpserver.pl` before all the test cases are performed.
555 The test suite currently only runs on Unix-like platforms.
556
557 You'll find a description of the test suite in the `tests/README` file, and
558 the test case data files in the `tests/FILEFORMAT` file.
559
560 The test suite automatically detects if curl was built with the memory
561 debugging enabled, and if it was, it will detect memory leaks, too.
562
563<a name="asyncdns"></a>
564Asynchronous name resolves
565==========================
566
567 libcurl can be built to do name resolves asynchronously, using either the
568 normal resolver in a threaded manner or by using c-ares.
569
570<a name="cares"></a>
571[c-ares][3]
572------
573
574### Build libcurl to use a c-ares
575
5761. ./configure --enable-ares=/path/to/ares/install
5772. make
578
579### c-ares on win32
580
581 First I compiled c-ares. I changed the default C runtime library to be the
582 single-threaded rather than the multi-threaded (this seems to be required to
583 prevent linking errors later on). Then I simply build the areslib project
584 (the other projects adig/ahost seem to fail under MSVC).
585
586 Next was libcurl. I opened `lib/config-win32.h` and I added a:
587 `#define USE_ARES 1`
588
589 Next thing I did was I added the path for the ares includes to the include
590 path, and the libares.lib to the libraries.
591
592 Lastly, I also changed libcurl to be single-threaded rather than
593 multi-threaded, again this was to prevent some duplicate symbol errors. I'm
594 not sure why I needed to change everything to single-threaded, but when I
595 didn't I got redefinition errors for several CRT functions (`malloc()`,
596 `stricmp()`, etc.)
597
598<a name="curl_off_t"></a>
599`curl_off_t`
600==========
601
602 `curl_off_t` is a data type provided by the external libcurl include
603 headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
604 options that end with LARGE. The type is 64-bit large on most modern
605 platforms.
606
607<a name="curlx"></a>
608curlx
609=====
610
611 The libcurl source code offers a few functions by source only. They are not
612 part of the official libcurl API, but the source files might be useful for
613 others so apps can optionally compile/build with these sources to gain
614 additional functions.
615
616 We provide them through a single header file for easy access for apps:
617 `curlx.h`
618
619`curlx_strtoofft()`
620-------------------
621   A macro that converts a string containing a number to a `curl_off_t` number.
622   This might use the `curlx_strtoll()` function which is provided as source
623   code in strtoofft.c. Note that the function is only provided if no
624   `strtoll()` (or equivalent) function exist on your platform. If `curl_off_t`
625   is only a 32-bit number on your platform, this macro uses `strtol()`.
626
627Future
628------
629
630 Several functions will be removed from the public `curl_` name space in a
631 future libcurl release. They will then only become available as `curlx_`
632 functions instead. To make the transition easier, we already today provide
633 these functions with the `curlx_` prefix to allow sources to be built
634 properly with the new function names. The concerned functions are:
635
636 - `curlx_getenv`
637 - `curlx_strequal`
638 - `curlx_strnequal`
639 - `curlx_mvsnprintf`
640 - `curlx_msnprintf`
641 - `curlx_maprintf`
642 - `curlx_mvaprintf`
643 - `curlx_msprintf`
644 - `curlx_mprintf`
645 - `curlx_mfprintf`
646 - `curlx_mvsprintf`
647 - `curlx_mvprintf`
648 - `curlx_mvfprintf`
649
650<a name="contentencoding"></a>
651Content Encoding
652================
653
654## About content encodings
655
656 [HTTP/1.1][4] specifies that a client may request that a server encode its
657 response. This is usually used to compress a response using one (or more)
658 encodings from a set of commonly available compression techniques. These
659 schemes include `deflate` (the zlib algorithm), `gzip`, `br` (brotli) and
660 `compress`. A client requests that the server perform an encoding by including
661 an `Accept-Encoding` header in the request document. The value of the header
662 should be one of the recognized tokens `deflate`, ... (there's a way to
663 register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor
664 the client's encoding request. When a response is encoded, the server
665 includes a `Content-Encoding` header in the response. The value of the
666 `Content-Encoding` header indicates which encodings were used to encode the
667 data, in the order in which they were applied.
668
669 It's also possible for a client to attach priorities to different schemes so
670 that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
671 information on the `Accept-Encoding` header. See sec
672 [3.1.2.2 of RFC 7231][15] for more information on the `Content-Encoding`
673 header.
674
675## Supported content encodings
676
677 The `deflate`, `gzip` and `br` content encodings are supported by libcurl.
678 Both regular and chunked transfers work fine.  The zlib library is required
679 for the `deflate` and `gzip` encodings, while the brotli decoding library is
680 for the `br` encoding.
681
682## The libcurl interface
683
684 To cause libcurl to request a content encoding use:
685
686  [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
687
688 where string is the intended value of the `Accept-Encoding` header.
689
690 Currently, libcurl does support multiple encodings but only
691 understands how to process responses that use the `deflate`, `gzip` and/or
692 `br` content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5]
693 that will work (besides `identity`, which does nothing) are `deflate`,
694 `gzip` and `br`. If a response is encoded using the `compress` or methods,
695 libcurl will return an error indicating that the response could
696 not be decoded.  If `<string>` is NULL no `Accept-Encoding` header is
697 generated. If `<string>` is a zero-length string, then an `Accept-Encoding`
698 header containing all supported encodings will be generated.
699
700 The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
701 content to be automatically decoded.  If it is not set and the server still
702 sends encoded content (despite not having been asked), the data is returned
703 in its raw form and the `Content-Encoding` type is not checked.
704
705## The curl interface
706
707 Use the [`--compressed`][6] option with curl to cause it to ask servers to
708 compress responses using any format supported by curl.
709
710<a name="hostip"></a>
711`hostip.c` explained
712====================
713
714 The main compile-time defines to keep in mind when reading the `host*.c`
715 source file are these:
716
717## `CURLRES_IPV6`
718
719 this host has `getaddrinfo()` and family, and thus we use that. The host may
720 not be able to resolve IPv6, but we don't really have to take that into
721 account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined.
722
723## `CURLRES_ARES`
724
725 is defined if libcurl is built to use c-ares for asynchronous name
726 resolves. This can be Windows or \*nix.
727
728## `CURLRES_THREADED`
729
730 is defined if libcurl is built to use threading for asynchronous name
731 resolves. The name resolve will be done in a new thread, and the supported
732 asynch API will be the same as for ares-builds. This is the default under
733 (native) Windows.
734
735 If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
736 libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
737 defined.
738
739## `host*.c` sources
740
741 The `host*.c` sources files are split up like this:
742
743 - `hostip.c`      - method-independent resolver functions and utility functions
744 - `hostasyn.c`    - functions for asynchronous name resolves
745 - `hostsyn.c`     - functions for synchronous name resolves
746 - `asyn-ares.c`   - functions for asynchronous name resolves using c-ares
747 - `asyn-thread.c` - functions for asynchronous name resolves using threads
748 - `hostip4.c`     - IPv4 specific functions
749 - `hostip6.c`     - IPv6 specific functions
750
751 The `hostip.h` is the single united header file for all this. It defines the
752 `CURLRES_*` defines based on the `config*.h` and `curl_setup.h` defines.
753
754<a name="memoryleak"></a>
755Track Down Memory Leaks
756=======================
757
758## Single-threaded
759
760  Please note that this memory leak system is not adjusted to work in more
761  than one thread. If you want/need to use it in a multi-threaded app. Please
762  adjust accordingly.
763
764## Build
765
766  Rebuild libcurl with `-DCURLDEBUG` (usually, rerunning configure with
767  `--enable-debug` fixes this). `make clean` first, then `make` so that all
768  files are actually rebuilt properly. It will also make sense to build
769  libcurl with the debug option (usually `-g` to the compiler) so that
770  debugging it will be easier if you actually do find a leak in the library.
771
772  This will create a library that has memory debugging enabled.
773
774## Modify Your Application
775
776  Add a line in your application code:
777
778       `curl_dbg_memdebug("dump");`
779
780  This will make the malloc debug system output a full trace of all resource
781  using functions to the given file name. Make sure you rebuild your program
782  and that you link with the same libcurl you built for this purpose as
783  described above.
784
785## Run Your Application
786
787  Run your program as usual. Watch the specified memory trace file grow.
788
789  Make your program exit and use the proper libcurl cleanup functions etc. So
790  that all non-leaks are returned/freed properly.
791
792## Analyze the Flow
793
794  Use the `tests/memanalyze.pl` perl script to analyze the dump file:
795
796    tests/memanalyze.pl dump
797
798  This now outputs a report on what resources that were allocated but never
799  freed etc. This report is very fine for posting to the list!
800
801  If this doesn't produce any output, no leak was detected in libcurl. Then
802  the leak is mostly likely to be in your code.
803
804<a name="multi_socket"></a>
805`multi_socket`
806==============
807
808 Implementation of the `curl_multi_socket` API
809
810 The main ideas of this API are simply:
811
812 1. The application can use whatever event system it likes as it gets info
813    from libcurl about what file descriptors libcurl waits for what action
814    on. (The previous API returns `fd_sets` which is very
815    `select()`-centric).
816
817 2. When the application discovers action on a single socket, it calls
818    libcurl and informs that there was action on this particular socket and
819    libcurl can then act on that socket/transfer only and not care about
820    any other transfers. (The previous API always had to scan through all
821    the existing transfers.)
822
823 The idea is that [`curl_multi_socket_action()`][7] calls a given callback
824 with information about what socket to wait for what action on, and the
825 callback only gets called if the status of that socket has changed.
826
827 We also added a timer callback that makes libcurl call the application when
828 the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
829 and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
830 Internally, there's an added struct to each easy handle in which we store
831 an "expire time" (if any). The structs are then "splay sorted" so that we
832 can add and remove times from the linked list and yet somewhat swiftly
833 figure out both how long there is until the next nearest timer expires
834 and which timer (handle) we should take care of now. Of course, the upside
835 of all this is that we get a [`curl_multi_timeout()`][8] that should also
836 work with old-style applications that use [`curl_multi_perform()`][11].
837
838 We created an internal "socket to easy handles" hash table that given
839 a socket (file descriptor) returns the easy handle that waits for action on
840 that socket.  This hash is made using the already existing hash code
841 (previously only used for the DNS cache).
842
843 To make libcurl able to report plain sockets in the socket callback, we had
844 to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
845 the conversion from sockets to `fd_sets` for that function is only done in
846 the last step before the data is returned. I also had to extend c-ares to
847 get a function that can return plain sockets, as that library too returned
848 only `fd_sets` and that is no longer good enough. The changes done to c-ares
849 are available in c-ares 1.3.1 and later.
850
851<a name="structs"></a>
852Structs in libcurl
853==================
854
855This section should cover 7.32.0 pretty accurately, but will make sense even
856for older and later versions as things don't change drastically that often.
857
858<a name="Curl_easy"></a>
859## Curl_easy
860
861  The `Curl_easy` struct is the one returned to the outside in the external API
862  as a `CURL *`. This is usually known as an easy handle in API documentations
863  and examples.
864
865  Information and state that is related to the actual connection is in the
866  `connectdata` struct. When a transfer is about to be made, libcurl will
867  either create a new connection or re-use an existing one. The particular
868  connectdata that is used by this handle is pointed out by
869  `Curl_easy->easy_conn`.
870
871  Data and information that regard this particular single transfer is put in
872  the `SingleRequest` sub-struct.
873
874  When the `Curl_easy` struct is added to a multi handle, as it must be in
875  order to do any transfer, the `->multi` member will point to the `Curl_multi`
876  struct it belongs to. The `->prev` and `->next` members will then be used by
877  the multi code to keep a linked list of `Curl_easy` structs that are added to
878  that same multi handle. libcurl always uses multi so `->multi` *will* point
879  to a `Curl_multi` when a transfer is in progress.
880
881  `->mstate` is the multi state of this particular `Curl_easy`. When
882  `multi_runsingle()` is called, it will act on this handle according to which
883  state it is in. The mstate is also what tells which sockets to return for a
884  specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc.
885
886  The libcurl source code generally use the name `data` for the variable that
887  points to the `Curl_easy`.
888
889  When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with
890  an individual stream, sharing the same connectdata struct. Multiplexing
891  makes it even more important to keep things associated with the right thing!
892
893<a name="connectdata"></a>
894## connectdata
895
896  A general idea in libcurl is to keep connections around in a connection
897  "cache" after they have been used in case they will be used again and then
898  re-use an existing one instead of creating a new as it creates a significant
899  performance boost.
900
901  Each `connectdata` identifies a single physical connection to a server. If
902  the connection can't be kept alive, the connection will be closed after use
903  and then this struct can be removed from the cache and freed.
904
905  Thus, the same `Curl_easy` can be used multiple times and each time select
906  another `connectdata` struct to use for the connection. Keep this in mind,
907  as it is then important to consider if options or choices are based on the
908  connection or the `Curl_easy`.
909
910  Functions in libcurl will assume that `connectdata->data` points to the
911  `Curl_easy` that uses this connection (for the moment).
912
913  As a special complexity, some protocols supported by libcurl require a
914  special disconnect procedure that is more than just shutting down the
915  socket. It can involve sending one or more commands to the server before
916  doing so. Since connections are kept in the connection cache after use, the
917  original `Curl_easy` may no longer be around when the time comes to shut down
918  a particular connection. For this purpose, libcurl holds a special dummy
919  `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed.
920
921  FTP uses two TCP connections for a typical transfer but it keeps both in
922  this single struct and thus can be considered a single connection for most
923  internal concerns.
924
925  The libcurl source code generally use the name `conn` for the variable that
926  points to the connectdata.
927
928<a name="Curl_multi"></a>
929## Curl_multi
930
931  Internally, the easy interface is implemented as a wrapper around multi
932  interface functions. This makes everything multi interface.
933
934  `Curl_multi` is the multi handle struct exposed as `CURLM *` in external
935  APIs.
936
937  This struct holds a list of `Curl_easy` structs that have been added to this
938  handle with [`curl_multi_add_handle()`][13]. The start of the list is
939  `->easyp` and `->num_easy` is a counter of added `Curl_easy`s.
940
941  `->msglist` is a linked list of messages to send back when
942  [`curl_multi_info_read()`][14] is called. Basically a node is added to that
943  list when an individual `Curl_easy`'s transfer has completed.
944
945  `->hostcache` points to the name cache. It is a hash table for looking up
946  name to IP. The nodes have a limited life time in there and this cache is
947  meant to reduce the time for when the same name is wanted within a short
948  period of time.
949
950  `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time
951  until it should be checked - normally some sort of timeout. Each `Curl_easy`
952  has one node in the tree.
953
954  `->sockhash` is a hash table to allow fast lookups of socket descriptor for
955  which `Curl_easy` uses that descriptor. This is necessary for the
956  `multi_socket` API.
957
958  `->conn_cache` points to the connection cache. It keeps track of all
959  connections that are kept after use. The cache has a maximum size.
960
961  `->closure_handle` is described in the `connectdata` section.
962
963  The libcurl source code generally use the name `multi` for the variable that
964  points to the `Curl_multi` struct.
965
966<a name="Curl_handler"></a>
967## Curl_handler
968
969  Each unique protocol that is supported by libcurl needs to provide at least
970  one `Curl_handler` struct. It defines what the protocol is called and what
971  functions the main code should call to deal with protocol specific issues.
972  In general, there's a source file named `[protocol].c` in which there's a
973  `struct Curl_handler Curl_handler_[protocol]` declared. In `url.c` there's
974  then the main array with all individual `Curl_handler` structs pointed to
975  from a single array which is scanned through when a URL is given to libcurl
976  to work with.
977
978  `->scheme` is the URL scheme name, usually spelled out in uppercase. That's
979  "HTTP" or "FTP" etc. SSL versions of the protocol need their own
980  `Curl_handler` setup so HTTPS separate from HTTP.
981
982  `->setup_connection` is called to allow the protocol code to allocate
983  protocol specific data that then gets associated with that `Curl_easy` for
984  the rest of this transfer. It gets freed again at the end of the transfer.
985  It will be called before the `connectdata` for the transfer has been
986  selected/created. Most protocols will allocate its private
987  `struct [PROTOCOL]` here and assign `Curl_easy->req.protop` to point to it.
988
989  `->connect_it` allows a protocol to do some specific actions after the TCP
990  connect is done, that can still be considered part of the connection phase.
991
992  Some protocols will alter the `connectdata->recv[]` and
993  `connectdata->send[]` function pointers in this function.
994
995  `->connecting` is similarly a function that keeps getting called as long as
996  the protocol considers itself still in the connecting phase.
997
998  `->do_it` is the function called to issue the transfer request. What we call
999  the DO action internally. If the DO is not enough and things need to be kept
1000  getting done for the entire DO sequence to complete, `->doing` is then
1001  usually also provided. Each protocol that needs to do multiple commands or
1002  similar for do/doing need to implement their own state machines (see SCP,
1003  SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has
1004  a separate piece of the DO state called `DO_MORE`.
1005
1006  `->doing` keeps getting called while issuing the transfer request command(s)
1007
1008  `->done` gets called when the transfer is complete and DONE. That's after the
1009  main data has been transferred.
1010
1011  `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses
1012  this state when setting up the second connection.
1013
1014  `->proto_getsock`
1015  `->doing_getsock`
1016  `->domore_getsock`
1017  `->perform_getsock`
1018  Functions that return socket information. Which socket(s) to wait for which
1019  action(s) during the particular multi state.
1020
1021  `->disconnect` is called immediately before the TCP connection is shutdown.
1022
1023  `->readwrite` gets called during transfer to allow the protocol to do extra
1024  reads/writes
1025
1026  `->defport` is the default report TCP or UDP port this protocol uses
1027
1028  `->protocol` is one or more bits in the `CURLPROTO_*` set. The SSL versions
1029  have their "base" protocol set and then the SSL variation. Like
1030  "HTTP|HTTPS".
1031
1032  `->flags` is a bitmask with additional information about the protocol that will
1033  make it get treated differently by the generic engine:
1034
1035  - `PROTOPT_SSL` - will make it connect and negotiate SSL
1036
1037  - `PROTOPT_DUAL` - this protocol uses two connections
1038
1039  - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
1040    connection. This flag is no longer used by code, yet still set for a bunch
1041    of protocol handlers.
1042
1043  - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
1044    limit which "direction" of socket actions that the main engine will
1045    concern itself with.
1046
1047  - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read `file:`)
1048
1049  - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
1050    one unless one is provided
1051
1052  - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
1053    (?foo=bar)
1054
1055<a name="conncache"></a>
1056## conncache
1057
1058  Is a hash table with connections for later re-use. Each `Curl_easy` has a
1059  pointer to its connection cache. Each multi handle sets up a connection
1060  cache that all added `Curl_easy`s share by default.
1061
1062<a name="Curl_share"></a>
1063## Curl_share
1064
1065  The libcurl share API allocates a `Curl_share` struct, exposed to the
1066  external API as `CURLSH *`.
1067
1068  The idea is that the struct can have a set of its own versions of caches and
1069  pools and then by providing this struct in the `CURLOPT_SHARE` option, those
1070  specific `Curl_easy`s will use the caches/pools that this share handle
1071  holds.
1072
1073  Then individual `Curl_easy` structs can be made to share specific things
1074  that they otherwise wouldn't, such as cookies.
1075
1076  The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
1077  session cache.
1078
1079<a name="CookieInfo"></a>
1080## CookieInfo
1081
1082  This is the main cookie struct. It holds all known cookies and related
1083  information. Each `Curl_easy` has its own private `CookieInfo` even when
1084  they are added to a multi handle. They can be made to share cookies by using
1085  the share API.
1086
1087
1088[1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
1089[2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
1090[3]: https://c-ares.haxx.se/
1091[4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
1092[5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
1093[6]: https://curl.haxx.se/docs/manpage.html#--compressed
1094[7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
1095[8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
1096[9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
1097[10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
1098[11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
1099[12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
1100[13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
1101[14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html
1102[15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2
1103