1curl internals
2==============
3
4 - [Intro](#intro)
5 - [git](#git)
6 - [Portability](#Portability)
7 - [Windows vs Unix](#winvsunix)
8 - [Library](#Library)
9   - [`Curl_connect`](#Curl_connect)
10   - [`multi_do`](#multi_do)
11   - [`Curl_readwrite`](#Curl_readwrite)
12   - [`multi_done`](#multi_done)
13   - [`Curl_disconnect`](#Curl_disconnect)
14 - [HTTP(S)](#http)
15 - [FTP](#ftp)
16   - [Kerberos](#kerberos)
17 - [TELNET](#telnet)
18 - [FILE](#file)
19 - [SMB](#smb)
20 - [LDAP](#ldap)
21 - [E-mail](#email)
22 - [General](#general)
23 - [Persistent Connections](#persistent)
24 - [multi interface/non-blocking](#multi)
25 - [SSL libraries](#ssl)
26 - [Library Symbols](#symbols)
27 - [Return Codes and Informationals](#returncodes)
28 - [AP/ABI](#abi)
29 - [Client](#client)
30 - [Memory Debugging](#memorydebug)
31 - [Test Suite](#test)
32 - [Asynchronous name resolves](#asyncdns)
33   - [c-ares](#cares)
34 - [`curl_off_t`](#curl_off_t)
35 - [curlx](#curlx)
36 - [Content Encoding](#contentencoding)
37 - [hostip.c explained](#hostip)
38 - [Track Down Memory Leaks](#memoryleak)
39 - [`multi_socket`](#multi_socket)
40 - [Structs in libcurl](#structs)
41
42<a name="intro"></a>
43Intro
44=====
45
46 This project is split in two. The library and the client. The client part
47 uses the library, but the library is designed to allow other applications to
48 use it.
49
50 The largest amount of code and complexity is in the library part.
51
52
53<a name="git"></a>
54git
55===
56
57 All changes to the sources are committed to the git repository as soon as
58 they're somewhat verified to work. Changes shall be committed as independently
59 as possible so that individual changes can be easily spotted and tracked
60 afterwards.
61
62 Tagging shall be used extensively, and by the time we release new archives we
63 should tag the sources with a name similar to the released version number.
64
65<a name="Portability"></a>
66Portability
67===========
68
69 We write curl and libcurl to compile with C89 compilers.  On 32bit and up
70 machines. Most of libcurl assumes more or less POSIX compliance but that's
71 not a requirement.
72
73 We write libcurl to build and work with lots of third party tools, and we
74 want it to remain functional and buildable with these and later versions
75 (older versions may still work but is not what we work hard to maintain):
76
77Dependencies
78------------
79
80 - OpenSSL      0.9.7
81 - GnuTLS       2.11.3
82 - zlib         1.1.4
83 - libssh2      0.16
84 - c-ares       1.6.0
85 - libidn2      2.0.0
86 - cyassl       2.0.0
87 - openldap     2.0
88 - MIT Kerberos 1.2.4
89 - GSKit        V5R3M0
90 - NSS          3.14.x
91 - PolarSSL     1.3.0
92 - Heimdal      ?
93 - nghttp2      1.0.0
94
95Operating Systems
96-----------------
97
98 On systems where configure runs, we aim at working on them all - if they have
99 a suitable C compiler. On systems that don't run configure, we strive to keep
100 curl running correctly on:
101
102 - Windows      98
103 - AS/400       V5R3M0
104 - Symbian      9.1
105 - Windows CE   ?
106 - TPF          ?
107
108Build tools
109-----------
110
111 When writing code (mostly for generating stuff included in release tarballs)
112 we use a few "build tools" and we make sure that we remain functional with
113 these versions:
114
115 - GNU Libtool  1.4.2
116 - GNU Autoconf 2.57
117 - GNU Automake 1.7
118 - GNU M4       1.4
119 - perl         5.004
120 - roffit       0.5
121 - groff        ? (any version that supports "groff -Tps -man [in] [out]")
122 - ps2pdf (gs)  ?
123
124<a name="winvsunix"></a>
125Windows vs Unix
126===============
127
128 There are a few differences in how to program curl the Unix way compared to
129 the Windows way. Perhaps the four most notable details are:
130
131 1. Different function names for socket operations.
132
133   In curl, this is solved with defines and macros, so that the source looks
134   the same in all places except for the header file that defines them. The
135   macros in use are sclose(), sread() and swrite().
136
137 2. Windows requires a couple of init calls for the socket stuff.
138
139   That's taken care of by the `curl_global_init()` call, but if other libs
140   also do it etc there might be reasons for applications to alter that
141   behaviour.
142
143 3. The file descriptors for network communication and file operations are
144    not as easily interchangeable as in Unix.
145
146   We avoid this by not trying any funny tricks on file descriptors.
147
148 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
149    destroying binary data, although you do want that conversion if it is
150    text coming through... (sigh)
151
152   We set stdout to binary under windows
153
154 Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
155 conditionals that deal with features *should* instead be in the format
156 `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
157 we maintain a `curl_config-win32.h` file in lib directory that is supposed to
158 look exactly like a `curl_config.h` file would have looked like on a Windows
159 machine!
160
161 Generally speaking: always remember that this will be compiled on dozens of
162 operating systems. Don't walk on the edge!
163
164<a name="Library"></a>
165Library
166=======
167
168 (See [Structs in libcurl](#structs) for the separate section describing all
169 major internal structs and their purposes.)
170
171 There are plenty of entry points to the library, namely each publicly defined
172 function that libcurl offers to applications. All of those functions are
173 rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
174 put in the lib/easy.c file.
175
176 `curl_global_init()` and `curl_global_cleanup()` should be called by the
177 application to initialize and clean up global stuff in the library. As of
178 today, it can handle the global SSL initing if SSL is enabled and it can init
179 the socket layer on windows machines. libcurl itself has no "global" scope.
180
181 All printf()-style functions use the supplied clones in lib/mprintf.c. This
182 makes sure we stay absolutely platform independent.
183
184 [ `curl_easy_init()`][2] allocates an internal struct and makes some
185 initializations.  The returned handle does not reveal internals. This is the
186 `Curl_easy` struct which works as an "anchor" struct for all `curl_easy`
187 functions. All connections performed will get connect-specific data allocated
188 that should be used for things related to particular connections/requests.
189
190 [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
191 be passed in pairs: the parameter-ID and the parameter-value. The list of
192 options is documented in the man page. This function mainly sets things in
193 the `Curl_easy` struct.
194
195 `curl_easy_perform()` is just a wrapper function that makes use of the multi
196 API.  It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
197 `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
198 and then returns.
199
200 Some of the most important key functions in url.c are called from multi.c
201 when certain key steps are to be made in the transfer operation.
202
203<a name="Curl_connect"></a>
204Curl_connect()
205--------------
206
207   Analyzes the URL, it separates the different components and connects to the
208   remote host. This may involve using a proxy and/or using SSL. The
209   `Curl_resolv()` function in lib/hostip.c is used for looking up host names
210   (it does then use the proper underlying method, which may vary between
211   platforms and builds).
212
213   When `Curl_connect` is done, we are connected to the remote site. Then it
214   is time to tell the server to get a document/file. `Curl_do()` arranges
215   this.
216
217   This function makes sure there's an allocated and initiated 'connectdata'
218   struct that is used for this particular connection only (although there may
219   be several requests performed on the same connect). A bunch of things are
220   inited/inherited from the `Curl_easy` struct.
221
222<a name="multi_do"></a>
223multi_do()
224---------
225
226   `multi_do()` makes sure the proper protocol-specific function is called. The
227   functions are named after the protocols they handle.
228
229   The protocol-specific functions of course deal with protocol-specific
230   negotiations and setup. They have access to the `Curl_sendf()` (from
231   lib/sendf.c) function to send printf-style formatted data to the remote
232   host and when they're ready to make the actual file transfer they call the
233   `Curl_setup_transfer()` function (in lib/transfer.c) to setup the transfer
234   and returns.
235
236   If this DO function fails and the connection is being re-used, libcurl will
237   then close this connection, setup a new connection and re-issue the DO
238   request on that. This is because there is no way to be perfectly sure that
239   we have discovered a dead connection before the DO function and thus we
240   might wrongly be re-using a connection that was closed by the remote peer.
241
242<a name="Curl_readwrite"></a>
243Curl_readwrite()
244----------------
245
246   Called during the transfer of the actual protocol payload.
247
248   During transfer, the progress functions in lib/progress.c are called at
249   frequent intervals (or at the user's choice, a specified callback might get
250   called). The speedcheck functions in lib/speedcheck.c are also used to
251   verify that the transfer is as fast as required.
252
253<a name="multi_done"></a>
254multi_done()
255-----------
256
257   Called after a transfer is done. This function takes care of everything
258   that has to be done after a transfer. This function attempts to leave
259   matters in a state so that `multi_do()` should be possible to call again on
260   the same connection (in a persistent connection case). It might also soon
261   be closed with `Curl_disconnect()`.
262
263<a name="Curl_disconnect"></a>
264Curl_disconnect()
265-----------------
266
267   When doing normal connections and transfers, no one ever tries to close any
268   connections so this is not normally called when `curl_easy_perform()` is
269   used. This function is only used when we are certain that no more transfers
270   are going to be made on the connection. It can be also closed by force, or
271   it can be called to make sure that libcurl doesn't keep too many
272   connections alive at the same time.
273
274   This function cleans up all resources that are associated with a single
275   connection.
276
277<a name="http"></a>
278HTTP(S)
279=======
280
281 HTTP offers a lot and is the protocol in curl that uses the most lines of
282 code. There is a special file (lib/formdata.c) that offers all the multipart
283 post functions.
284
285 base64-functions for user+password stuff (and more) is in (lib/base64.c) and
286 all functions for parsing and sending cookies are found in (lib/cookie.c).
287
288 HTTPS uses in almost every case the same procedure as HTTP, with only two
289 exceptions: the connect procedure is different and the function used to read
290 or write from the socket is different, although the latter fact is hidden in
291 the source by the use of `Curl_read()` for reading and `Curl_write()` for
292 writing data to the remote server.
293
294 `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
295 encoding.
296
297 An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
298 series of functions we use. They append data to one single buffer, and when
299 the building is finished the entire request is sent off in one single write.
300 This is done this way to overcome problems with flawed firewalls and lame
301 servers.
302
303<a name="ftp"></a>
304FTP
305===
306
307 The `Curl_if2ip()` function can be used for getting the IP number of a
308 specified network interface, and it resides in lib/if2ip.c.
309
310 `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
311 was made a separate function to prevent us programmers from forgetting that
312 they must be CRLF terminated. They must also be sent in one single write() to
313 make firewalls and similar happy.
314
315<a name="kerberos"></a>
316Kerberos
317========
318
319 Kerberos support is mainly in lib/krb5.c and lib/security.c but also
320 `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
321 `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
322
323<a name="telnet"></a>
324TELNET
325======
326
327 Telnet is implemented in lib/telnet.c.
328
329<a name="file"></a>
330FILE
331====
332
333 The file:// protocol is dealt with in lib/file.c.
334
335<a name="smb"></a>
336SMB
337===
338
339 The smb:// protocol is dealt with in lib/smb.c.
340
341<a name="ldap"></a>
342LDAP
343====
344
345 Everything LDAP is in lib/ldap.c and lib/openldap.c
346
347<a name="email"></a>
348E-mail
349======
350
351 The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c.
352
353<a name="general"></a>
354General
355=======
356
357 URL encoding and decoding, called escaping and unescaping in the source code,
358 is found in lib/escape.c.
359
360 While transferring data in Transfer() a few functions might get used.
361 `curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more).
362
363 lib/getenv.c offers `curl_getenv()` which is for reading environment
364 variables in a neat platform independent way. That's used in the client, but
365 also in lib/url.c when checking the proxy environment variables. Note that
366 contrary to the normal unix getenv(), this returns an allocated buffer that
367 must be free()ed after use.
368
369 lib/netrc.c holds the .netrc parser
370
371 lib/timeval.c features replacement functions for systems that don't have
372 gettimeofday() and a few support functions for timeval conversions.
373
374 A function named `curl_version()` that returns the full curl version string
375 is found in lib/version.c.
376
377<a name="persistent"></a>
378Persistent Connections
379======================
380
381 The persistent connection support in libcurl requires some considerations on
382 how to do things inside of the library.
383
384 - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call
385   must never hold connection-oriented data. It is meant to hold the root data
386   as well as all the options etc that the library-user may choose.
387
388 - The `Curl_easy` struct holds the "connection cache" (an array of
389   pointers to 'connectdata' structs).
390
391 - This enables the 'curl handle' to be reused on subsequent transfers.
392
393 - When libcurl is told to perform a transfer, it first checks for an already
394   existing connection in the cache that we can use. Otherwise it creates a
395   new one and adds that to the cache. If the cache is full already when a new
396   connection is added, it will first close the oldest unused one.
397
398 - When the transfer operation is complete, the connection is left
399   open. Particular options may tell libcurl not to, and protocols may signal
400   closure on connections and then they won't be kept open, of course.
401
402 - When `curl_easy_cleanup()` is called, we close all still opened connections,
403   unless of course the multi interface "owns" the connections.
404
405 The curl handle must be re-used in order for the persistent connections to
406 work.
407
408<a name="multi"></a>
409multi interface/non-blocking
410============================
411
412 The multi interface is a non-blocking interface to the library. To make that
413 interface work as well as possible, no low-level functions within libcurl
414 must be written to work in a blocking manner. (There are still a few spots
415 violating this rule.)
416
417 One of the primary reasons we introduced c-ares support was to allow the name
418 resolve phase to be perfectly non-blocking as well.
419
420 The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
421 the code to allow non-blocking operations even on multi-stage command-
422 response protocols. They are built around state machines that return when
423 they would otherwise block waiting for data.  The DICT, LDAP and TELNET
424 protocols are crappy examples and they are subject for rewrite in the future
425 to better fit the libcurl protocol family.
426
427<a name="ssl"></a>
428SSL libraries
429=============
430
431 Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
432 extended to its successor OpenSSL but has since also been extended to several
433 other SSL/TLS libraries and we expect and hope to further extend the support
434 in future libcurl versions.
435
436 To deal with this internally in the best way possible, we have a generic SSL
437 function API as provided by the vtls/vtls.[ch] system, and they are the only
438 SSL functions we must use from within libcurl. vtls is then crafted to use
439 the appropriate lower-level function calls to whatever SSL library that is in
440 use. For example vtls/openssl.[ch] for the OpenSSL library.
441
442<a name="symbols"></a>
443Library Symbols
444===============
445
446 All symbols used internally in libcurl must use a `Curl_` prefix if they're
447 used in more than a single file. Single-file symbols must be made static.
448 Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
449 but they are to be changed to follow this pattern in future versions.) Public
450 API functions are marked with `CURL_EXTERN` in the public header files so
451 that all others can be hidden on platforms where this is possible.
452
453<a name="returncodes"></a>
454Return Codes and Informationals
455===============================
456
457 I've made things simple. Almost every function in libcurl returns a CURLcode,
458 that must be `CURLE_OK` if everything is OK or otherwise a suitable error
459 code as the curl/curl.h include file defines. The very spot that detects an
460 error must use the `Curl_failf()` function to set the human-readable error
461 description.
462
463 In aiding the user to understand what's happening and to debug curl usage, we
464 must supply a fair number of informational messages by using the
465 `Curl_infof()` function. Those messages are only displayed when the user
466 explicitly asks for them. They are best used when revealing information that
467 isn't otherwise obvious.
468
469<a name="abi"></a>
470API/ABI
471=======
472
473 We make an effort to not export or show internals or how internals work, as
474 that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
475 for our promise to users.
476
477<a name="client"></a>
478Client
479======
480
481 main() resides in `src/tool_main.c`.
482
483 `src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script
484 to display the complete "manual" and the `src/tool_urlglob.c` file holds the
485 functions used for the URL-"globbing" support. Globbing in the sense that the
486 {} and [] expansion stuff is there.
487
488 The client mostly sets up its 'config' struct properly, then
489 it calls the `curl_easy_*()` functions of the library and when it gets back
490 control after the `curl_easy_perform()` it cleans up the library, checks
491 status and exits.
492
493 When the operation is done, the ourWriteOut() function in src/writeout.c may
494 be called to report about the operation. That function is using the
495 `curl_easy_getinfo()` function to extract useful information from the curl
496 session.
497
498 It may loop and do all this several times if many URLs were specified on the
499 command line or config file.
500
501<a name="memorydebug"></a>
502Memory Debugging
503================
504
505 The file lib/memdebug.c contains debug-versions of a few functions. Functions
506 such as malloc, free, fopen, fclose, etc that somehow deal with resources
507 that might give us problems if we "leak" them. The functions in the memdebug
508 system do nothing fancy, they do their normal function and then log
509 information about what they just did. The logged data can then be analyzed
510 after a complete session,
511
512 memanalyze.pl is the perl script present in tests/ that analyzes a log file
513 generated by the memory tracking system. It detects if resources are
514 allocated but never freed and other kinds of errors related to resource
515 management.
516
517 Internally, definition of preprocessor symbol DEBUGBUILD restricts code which
518 is only compiled for debug enabled builds. And symbol CURLDEBUG is used to
519 differentiate code which is _only_ used for memory tracking/debugging.
520
521 Use -DCURLDEBUG when compiling to enable memory debugging, this is also
522 switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD
523 when compiling to enable a debug build or run configure with --enable-debug.
524
525 curl --version will list 'Debug' feature for debug enabled builds, and
526 will list 'TrackMemory' feature for curl debug memory tracking capable
527 builds. These features are independent and can be controlled when running
528 the configure script. When --enable-debug is given both features will be
529 enabled, unless some restriction prevents memory tracking from being used.
530
531<a name="test"></a>
532Test Suite
533==========
534
535 The test suite is placed in its own subdirectory directly off the root in the
536 curl archive tree, and it contains a bunch of scripts and a lot of test case
537 data.
538
539 The main test script is runtests.pl that will invoke test servers like
540 httpserver.pl and ftpserver.pl before all the test cases are performed. The
541 test suite currently only runs on Unix-like platforms.
542
543 You'll find a description of the test suite in the tests/README file, and the
544 test case data files in the tests/FILEFORMAT file.
545
546 The test suite automatically detects if curl was built with the memory
547 debugging enabled, and if it was, it will detect memory leaks, too.
548
549<a name="asyncdns"></a>
550Asynchronous name resolves
551==========================
552
553 libcurl can be built to do name resolves asynchronously, using either the
554 normal resolver in a threaded manner or by using c-ares.
555
556<a name="cares"></a>
557[c-ares][3]
558------
559
560### Build libcurl to use a c-ares
561
5621. ./configure --enable-ares=/path/to/ares/install
5632. make
564
565### c-ares on win32
566
567 First I compiled c-ares. I changed the default C runtime library to be the
568 single-threaded rather than the multi-threaded (this seems to be required to
569 prevent linking errors later on). Then I simply build the areslib project
570 (the other projects adig/ahost seem to fail under MSVC).
571
572 Next was libcurl. I opened lib/config-win32.h and I added a:
573 `#define USE_ARES 1`
574
575 Next thing I did was I added the path for the ares includes to the include
576 path, and the libares.lib to the libraries.
577
578 Lastly, I also changed libcurl to be single-threaded rather than
579 multi-threaded, again this was to prevent some duplicate symbol errors. I'm
580 not sure why I needed to change everything to single-threaded, but when I
581 didn't I got redefinition errors for several CRT functions (malloc, stricmp,
582 etc.)
583
584<a name="curl_off_t"></a>
585`curl_off_t`
586==========
587
588 `curl_off_t` is a data type provided by the external libcurl include
589 headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
590 options that end with LARGE. The type is 64bit large on most modern
591 platforms.
592
593<a name="curlx"></a>
594curlx
595=====
596
597 The libcurl source code offers a few functions by source only. They are not
598 part of the official libcurl API, but the source files might be useful for
599 others so apps can optionally compile/build with these sources to gain
600 additional functions.
601
602 We provide them through a single header file for easy access for apps:
603 "curlx.h"
604
605`curlx_strtoofft()`
606-------------------
607   A macro that converts a string containing a number to a `curl_off_t` number.
608   This might use the `curlx_strtoll()` function which is provided as source
609   code in strtoofft.c. Note that the function is only provided if no
610   strtoll() (or equivalent) function exist on your platform. If `curl_off_t`
611   is only a 32 bit number on your platform, this macro uses strtol().
612
613Future
614------
615
616 Several functions will be removed from the public `curl_` name space in a
617 future libcurl release. They will then only become available as `curlx_`
618 functions instead. To make the transition easier, we already today provide
619 these functions with the `curlx_` prefix to allow sources to be built
620 properly with the new function names. The concerned functions are:
621
622 - `curlx_getenv`
623 - `curlx_strequal`
624 - `curlx_strnequal`
625 - `curlx_mvsnprintf`
626 - `curlx_msnprintf`
627 - `curlx_maprintf`
628 - `curlx_mvaprintf`
629 - `curlx_msprintf`
630 - `curlx_mprintf`
631 - `curlx_mfprintf`
632 - `curlx_mvsprintf`
633 - `curlx_mvprintf`
634 - `curlx_mvfprintf`
635
636<a name="contentencoding"></a>
637Content Encoding
638================
639
640## About content encodings
641
642 [HTTP/1.1][4] specifies that a client may request that a server encode its
643 response. This is usually used to compress a response using one (or more)
644 encodings from a set of commonly available compression techniques. These
645 schemes include 'deflate' (the zlib algorithm), 'gzip' 'br' (brotli) and
646 'compress'. A client requests that the server perform an encoding by including
647 an Accept-Encoding header in the request document. The value of the header
648 should be one of the recognized tokens 'deflate', ... (there's a way to
649 register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor
650 the client's encoding request. When a response is encoded, the server
651 includes a Content-Encoding header in the response. The value of the
652 Content-Encoding header indicates which encodings were used to encode the
653 data, in the order in which they were applied.
654
655 It's also possible for a client to attach priorities to different schemes so
656 that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
657 information on the Accept-Encoding header. See sec [3.1.2.2 of RFC 7231][15]
658 for more information on the Content-Encoding header.
659
660## Supported content encodings
661
662 The 'deflate', 'gzip' and 'br' content encodings are supported by libcurl.
663 Both regular and chunked transfers work fine.  The zlib library is required
664 for the 'deflate' and 'gzip' encodings, while the brotli decoding library is
665 for the 'br' encoding.
666
667## The libcurl interface
668
669 To cause libcurl to request a content encoding use:
670
671  [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
672
673 where string is the intended value of the Accept-Encoding header.
674
675 Currently, libcurl does support multiple encodings but only
676 understands how to process responses that use the "deflate", "gzip" and/or
677 "br" content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5]
678 that will work (besides "identity," which does nothing) are "deflate",
679 "gzip" and "br". If a response is encoded using the "compress" or methods,
680 libcurl will return an error indicating that the response could
681 not be decoded.  If `<string>` is NULL no Accept-Encoding header is generated.
682 If `<string>` is a zero-length string, then an Accept-Encoding header
683 containing all supported encodings will be generated.
684
685 The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
686 content to be automatically decoded.  If it is not set and the server still
687 sends encoded content (despite not having been asked), the data is returned
688 in its raw form and the Content-Encoding type is not checked.
689
690## The curl interface
691
692 Use the [--compressed][6] option with curl to cause it to ask servers to
693 compress responses using any format supported by curl.
694
695<a name="hostip"></a>
696hostip.c explained
697==================
698
699 The main compile-time defines to keep in mind when reading the host*.c source
700 file are these:
701
702## `CURLRES_IPV6`
703
704 this host has getaddrinfo() and family, and thus we use that. The host may
705 not be able to resolve IPv6, but we don't really have to take that into
706 account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined.
707
708## `CURLRES_ARES`
709
710 is defined if libcurl is built to use c-ares for asynchronous name
711 resolves. This can be Windows or *nix.
712
713## `CURLRES_THREADED`
714
715 is defined if libcurl is built to use threading for asynchronous name
716 resolves. The name resolve will be done in a new thread, and the supported
717 asynch API will be the same as for ares-builds. This is the default under
718 (native) Windows.
719
720 If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
721 libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
722 defined.
723
724## host*.c sources
725
726 The host*.c sources files are split up like this:
727
728 - hostip.c      - method-independent resolver functions and utility functions
729 - hostasyn.c    - functions for asynchronous name resolves
730 - hostsyn.c     - functions for synchronous name resolves
731 - asyn-ares.c   - functions for asynchronous name resolves using c-ares
732 - asyn-thread.c - functions for asynchronous name resolves using threads
733 - hostip4.c     - IPv4 specific functions
734 - hostip6.c     - IPv6 specific functions
735
736 The hostip.h is the single united header file for all this. It defines the
737 `CURLRES_*` defines based on the config*.h and `curl_setup.h` defines.
738
739<a name="memoryleak"></a>
740Track Down Memory Leaks
741=======================
742
743## Single-threaded
744
745  Please note that this memory leak system is not adjusted to work in more
746  than one thread. If you want/need to use it in a multi-threaded app. Please
747  adjust accordingly.
748
749
750## Build
751
752  Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with
753  --enable-debug fixes this). 'make clean' first, then 'make' so that all
754  files are actually rebuilt properly. It will also make sense to build
755  libcurl with the debug option (usually -g to the compiler) so that debugging
756  it will be easier if you actually do find a leak in the library.
757
758  This will create a library that has memory debugging enabled.
759
760## Modify Your Application
761
762  Add a line in your application code:
763
764       `curl_memdebug("dump");`
765
766  This will make the malloc debug system output a full trace of all resource
767  using functions to the given file name. Make sure you rebuild your program
768  and that you link with the same libcurl you built for this purpose as
769  described above.
770
771## Run Your Application
772
773  Run your program as usual. Watch the specified memory trace file grow.
774
775  Make your program exit and use the proper libcurl cleanup functions etc. So
776  that all non-leaks are returned/freed properly.
777
778## Analyze the Flow
779
780  Use the tests/memanalyze.pl perl script to analyze the dump file:
781
782    tests/memanalyze.pl dump
783
784  This now outputs a report on what resources that were allocated but never
785  freed etc. This report is very fine for posting to the list!
786
787  If this doesn't produce any output, no leak was detected in libcurl. Then
788  the leak is mostly likely to be in your code.
789
790<a name="multi_socket"></a>
791`multi_socket`
792==============
793
794 Implementation of the `curl_multi_socket` API
795
796  The main ideas of this API are simply:
797
798   1 - The application can use whatever event system it likes as it gets info
799       from libcurl about what file descriptors libcurl waits for what action
800       on. (The previous API returns `fd_sets` which is very select()-centric).
801
802   2 - When the application discovers action on a single socket, it calls
803       libcurl and informs that there was action on this particular socket and
804       libcurl can then act on that socket/transfer only and not care about
805       any other transfers. (The previous API always had to scan through all
806       the existing transfers.)
807
808  The idea is that [`curl_multi_socket_action()`][7] calls a given callback
809  with information about what socket to wait for what action on, and the
810  callback only gets called if the status of that socket has changed.
811
812  We also added a timer callback that makes libcurl call the application when
813  the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
814  and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
815  Internally, there's an added struct to each easy handle in which we store
816  an "expire time" (if any). The structs are then "splay sorted" so that we
817  can add and remove times from the linked list and yet somewhat swiftly
818  figure out both how long there is until the next nearest timer expires
819  and which timer (handle) we should take care of now. Of course, the upside
820  of all this is that we get a [`curl_multi_timeout()`][8] that should also
821  work with old-style applications that use [`curl_multi_perform()`][11].
822
823  We created an internal "socket to easy handles" hash table that given
824  a socket (file descriptor) returns the easy handle that waits for action on
825  that socket.  This hash is made using the already existing hash code
826  (previously only used for the DNS cache).
827
828  To make libcurl able to report plain sockets in the socket callback, we had
829  to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
830  the conversion from sockets to `fd_sets` for that function is only done in
831  the last step before the data is returned. I also had to extend c-ares to
832  get a function that can return plain sockets, as that library too returned
833  only `fd_sets` and that is no longer good enough. The changes done to c-ares
834  are available in c-ares 1.3.1 and later.
835
836<a name="structs"></a>
837Structs in libcurl
838==================
839
840This section should cover 7.32.0 pretty accurately, but will make sense even
841for older and later versions as things don't change drastically that often.
842
843## Curl_easy
844
845  The `Curl_easy` struct is the one returned to the outside in the external API
846  as a "CURL *". This is usually known as an easy handle in API documentations
847  and examples.
848
849  Information and state that is related to the actual connection is in the
850  'connectdata' struct. When a transfer is about to be made, libcurl will
851  either create a new connection or re-use an existing one. The particular
852  connectdata that is used by this handle is pointed out by
853  `Curl_easy->easy_conn`.
854
855  Data and information that regard this particular single transfer is put in
856  the SingleRequest sub-struct.
857
858  When the `Curl_easy` struct is added to a multi handle, as it must be in
859  order to do any transfer, the ->multi member will point to the `Curl_multi`
860  struct it belongs to. The ->prev and ->next members will then be used by the
861  multi code to keep a linked list of `Curl_easy` structs that are added to
862  that same multi handle. libcurl always uses multi so ->multi *will* point to
863  a `Curl_multi` when a transfer is in progress.
864
865  ->mstate is the multi state of this particular `Curl_easy`. When
866  `multi_runsingle()` is called, it will act on this handle according to which
867  state it is in. The mstate is also what tells which sockets to return for a
868  specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc.
869
870  The libcurl source code generally use the name 'data' for the variable that
871  points to the `Curl_easy`.
872
873  When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with
874  an individual stream, sharing the same connectdata struct. Multiplexing
875  makes it even more important to keep things associated with the right thing!
876
877## connectdata
878
879  A general idea in libcurl is to keep connections around in a connection
880  "cache" after they have been used in case they will be used again and then
881  re-use an existing one instead of creating a new as it creates a significant
882  performance boost.
883
884  Each 'connectdata' identifies a single physical connection to a server. If
885  the connection can't be kept alive, the connection will be closed after use
886  and then this struct can be removed from the cache and freed.
887
888  Thus, the same `Curl_easy` can be used multiple times and each time select
889  another connectdata struct to use for the connection. Keep this in mind, as
890  it is then important to consider if options or choices are based on the
891  connection or the `Curl_easy`.
892
893  Functions in libcurl will assume that connectdata->data points to the
894  `Curl_easy` that uses this connection (for the moment).
895
896  As a special complexity, some protocols supported by libcurl require a
897  special disconnect procedure that is more than just shutting down the
898  socket. It can involve sending one or more commands to the server before
899  doing so. Since connections are kept in the connection cache after use, the
900  original `Curl_easy` may no longer be around when the time comes to shut down
901  a particular connection. For this purpose, libcurl holds a special dummy
902  `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed.
903
904  FTP uses two TCP connections for a typical transfer but it keeps both in
905  this single struct and thus can be considered a single connection for most
906  internal concerns.
907
908  The libcurl source code generally use the name 'conn' for the variable that
909  points to the connectdata.
910
911## Curl_multi
912
913  Internally, the easy interface is implemented as a wrapper around multi
914  interface functions. This makes everything multi interface.
915
916  `Curl_multi` is the multi handle struct exposed as "CURLM *" in external
917  APIs.
918
919  This struct holds a list of `Curl_easy` structs that have been added to this
920  handle with [`curl_multi_add_handle()`][13]. The start of the list is
921  `->easyp` and `->num_easy` is a counter of added `Curl_easy`s.
922
923  `->msglist` is a linked list of messages to send back when
924  [`curl_multi_info_read()`][14] is called. Basically a node is added to that
925  list when an individual `Curl_easy`'s transfer has completed.
926
927  `->hostcache` points to the name cache. It is a hash table for looking up
928  name to IP. The nodes have a limited life time in there and this cache is
929  meant to reduce the time for when the same name is wanted within a short
930  period of time.
931
932  `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time
933  until it should be checked - normally some sort of timeout. Each `Curl_easy`
934  has one node in the tree.
935
936  `->sockhash` is a hash table to allow fast lookups of socket descriptor for
937  which `Curl_easy` uses that descriptor. This is necessary for the
938  `multi_socket` API.
939
940  `->conn_cache` points to the connection cache. It keeps track of all
941  connections that are kept after use. The cache has a maximum size.
942
943  `->closure_handle` is described in the 'connectdata' section.
944
945  The libcurl source code generally use the name 'multi' for the variable that
946  points to the `Curl_multi` struct.
947
948## Curl_handler
949
950  Each unique protocol that is supported by libcurl needs to provide at least
951  one `Curl_handler` struct. It defines what the protocol is called and what
952  functions the main code should call to deal with protocol specific issues.
953  In general, there's a source file named [protocol].c in which there's a
954  "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's
955  then the main array with all individual `Curl_handler` structs pointed to
956  from a single array which is scanned through when a URL is given to libcurl
957  to work with.
958
959  `->scheme` is the URL scheme name, usually spelled out in uppercase. That's
960  "HTTP" or "FTP" etc. SSL versions of the protocol need their own
961  `Curl_handler` setup so HTTPS separate from HTTP.
962
963  `->setup_connection` is called to allow the protocol code to allocate
964  protocol specific data that then gets associated with that `Curl_easy` for
965  the rest of this transfer. It gets freed again at the end of the transfer.
966  It will be called before the 'connectdata' for the transfer has been
967  selected/created. Most protocols will allocate its private
968  'struct [PROTOCOL]' here and assign `Curl_easy->req.protop` to point to it.
969
970  `->connect_it` allows a protocol to do some specific actions after the TCP
971  connect is done, that can still be considered part of the connection phase.
972
973  Some protocols will alter the `connectdata->recv[]` and
974  `connectdata->send[]` function pointers in this function.
975
976  `->connecting` is similarly a function that keeps getting called as long as
977  the protocol considers itself still in the connecting phase.
978
979  `->do_it` is the function called to issue the transfer request. What we call
980  the DO action internally. If the DO is not enough and things need to be kept
981  getting done for the entire DO sequence to complete, `->doing` is then
982  usually also provided. Each protocol that needs to do multiple commands or
983  similar for do/doing need to implement their own state machines (see SCP,
984  SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has
985  a separate piece of the DO state called `DO_MORE`.
986
987  `->doing` keeps getting called while issuing the transfer request command(s)
988
989  `->done` gets called when the transfer is complete and DONE. That's after the
990  main data has been transferred.
991
992  `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses
993  this state when setting up the second connection.
994
995  ->`proto_getsock`
996  ->`doing_getsock`
997  ->`domore_getsock`
998  ->`perform_getsock`
999  Functions that return socket information. Which socket(s) to wait for which
1000  action(s) during the particular multi state.
1001
1002  ->disconnect is called immediately before the TCP connection is shutdown.
1003
1004  ->readwrite gets called during transfer to allow the protocol to do extra
1005  reads/writes
1006
1007  ->defport is the default report TCP or UDP port this protocol uses
1008
1009  ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions
1010  have their "base" protocol set and then the SSL variation. Like
1011  "HTTP|HTTPS".
1012
1013  ->flags is a bitmask with additional information about the protocol that will
1014  make it get treated differently by the generic engine:
1015
1016  - `PROTOPT_SSL` - will make it connect and negotiate SSL
1017
1018  - `PROTOPT_DUAL` - this protocol uses two connections
1019
1020  - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
1021    connection. This flag is no longer used by code, yet still set for a bunch
1022    of protocol handlers.
1023
1024  - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
1025    limit which "direction" of socket actions that the main engine will
1026    concern itself with.
1027
1028  - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:)
1029
1030  - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
1031    one unless one is provided
1032
1033  - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
1034    (?foo=bar)
1035
1036## conncache
1037
1038  Is a hash table with connections for later re-use. Each `Curl_easy` has a
1039  pointer to its connection cache. Each multi handle sets up a connection
1040  cache that all added `Curl_easy`s share by default.
1041
1042## Curl_share
1043
1044  The libcurl share API allocates a `Curl_share` struct, exposed to the
1045  external API as "CURLSH *".
1046
1047  The idea is that the struct can have a set of its own versions of caches and
1048  pools and then by providing this struct in the `CURLOPT_SHARE` option, those
1049  specific `Curl_easy`s will use the caches/pools that this share handle
1050  holds.
1051
1052  Then individual `Curl_easy` structs can be made to share specific things
1053  that they otherwise wouldn't, such as cookies.
1054
1055  The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
1056  session cache.
1057
1058## CookieInfo
1059
1060  This is the main cookie struct. It holds all known cookies and related
1061  information. Each `Curl_easy` has its own private CookieInfo even when
1062  they are added to a multi handle. They can be made to share cookies by using
1063  the share API.
1064
1065
1066[1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
1067[2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
1068[3]: https://c-ares.haxx.se/
1069[4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
1070[5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
1071[6]: https://curl.haxx.se/docs/manpage.html#--compressed
1072[7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
1073[8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
1074[9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
1075[10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
1076[11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
1077[12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
1078[13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
1079[14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html
1080[15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2
1081