1curl internals
2==============
3
4 - [Intro](#intro)
5 - [git](#git)
6 - [Portability](#Portability)
7 - [Windows vs Unix](#winvsunix)
8 - [Library](#Library)
9   - [`Curl_connect`](#Curl_connect)
10   - [`Curl_do`](#Curl_do)
11   - [`Curl_readwrite`](#Curl_readwrite)
12   - [`Curl_done`](#Curl_done)
13   - [`Curl_disconnect`](#Curl_disconnect)
14 - [HTTP(S)](#http)
15 - [FTP](#ftp)
16   - [Kerberos](#kerberos)
17 - [TELNET](#telnet)
18 - [FILE](#file)
19 - [SMB](#smb)
20 - [LDAP](#ldap)
21 - [E-mail](#email)
22 - [General](#general)
23 - [Persistent Connections](#persistent)
24 - [multi interface/non-blocking](#multi)
25 - [SSL libraries](#ssl)
26 - [Library Symbols](#symbols)
27 - [Return Codes and Informationals](#returncodes)
28 - [AP/ABI](#abi)
29 - [Client](#client)
30 - [Memory Debugging](#memorydebug)
31 - [Test Suite](#test)
32 - [Asynchronous name resolves](#asyncdns)
33   - [c-ares](#cares)
34 - [`curl_off_t`](#curl_off_t)
35 - [curlx](#curlx)
36 - [Content Encoding](#contentencoding)
37 - [hostip.c explained](#hostip)
38 - [Track Down Memory Leaks](#memoryleak)
39 - [`multi_socket`](#multi_socket)
40 - [Structs in libcurl](#structs)
41
42<a name="intro"></a>
43Intro
44=====
45
46 This project is split in two. The library and the client. The client part
47 uses the library, but the library is designed to allow other applications to
48 use it.
49
50 The largest amount of code and complexity is in the library part.
51
52
53<a name="git"></a>
54git
55===
56
57 All changes to the sources are committed to the git repository as soon as
58 they're somewhat verified to work. Changes shall be committed as independently
59 as possible so that individual changes can be easily spotted and tracked
60 afterwards.
61
62 Tagging shall be used extensively, and by the time we release new archives we
63 should tag the sources with a name similar to the released version number.
64
65<a name="Portability"></a>
66Portability
67===========
68
69 We write curl and libcurl to compile with C89 compilers.  On 32bit and up
70 machines. Most of libcurl assumes more or less POSIX compliance but that's
71 not a requirement.
72
73 We write libcurl to build and work with lots of third party tools, and we
74 want it to remain functional and buildable with these and later versions
75 (older versions may still work but is not what we work hard to maintain):
76
77Dependencies
78------------
79
80 - OpenSSL      0.9.7
81 - GnuTLS       1.2
82 - zlib         1.1.4
83 - libssh2      0.16
84 - c-ares       1.6.0
85 - libidn2      2.0.0
86 - cyassl       2.0.0
87 - openldap     2.0
88 - MIT Kerberos 1.2.4
89 - GSKit        V5R3M0
90 - NSS          3.14.x
91 - axTLS        2.1.0
92 - PolarSSL     1.3.0
93 - Heimdal      ?
94 - nghttp2      1.0.0
95
96Operating Systems
97-----------------
98
99 On systems where configure runs, we aim at working on them all - if they have
100 a suitable C compiler. On systems that don't run configure, we strive to keep
101 curl running correctly on:
102
103 - Windows      98
104 - AS/400       V5R3M0
105 - Symbian      9.1
106 - Windows CE   ?
107 - TPF          ?
108
109Build tools
110-----------
111
112 When writing code (mostly for generating stuff included in release tarballs)
113 we use a few "build tools" and we make sure that we remain functional with
114 these versions:
115
116 - GNU Libtool  1.4.2
117 - GNU Autoconf 2.57
118 - GNU Automake 1.7
119 - GNU M4       1.4
120 - perl         5.004
121 - roffit       0.5
122 - groff        ? (any version that supports "groff -Tps -man [in] [out]")
123 - ps2pdf (gs)  ?
124
125<a name="winvsunix"></a>
126Windows vs Unix
127===============
128
129 There are a few differences in how to program curl the Unix way compared to
130 the Windows way. Perhaps the four most notable details are:
131
132 1. Different function names for socket operations.
133
134   In curl, this is solved with defines and macros, so that the source looks
135   the same in all places except for the header file that defines them. The
136   macros in use are sclose(), sread() and swrite().
137
138 2. Windows requires a couple of init calls for the socket stuff.
139
140   That's taken care of by the `curl_global_init()` call, but if other libs
141   also do it etc there might be reasons for applications to alter that
142   behaviour.
143
144 3. The file descriptors for network communication and file operations are
145    not as easily interchangeable as in Unix.
146
147   We avoid this by not trying any funny tricks on file descriptors.
148
149 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
150    destroying binary data, although you do want that conversion if it is
151    text coming through... (sigh)
152
153   We set stdout to binary under windows
154
155 Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
156 conditionals that deal with features *should* instead be in the format
157 `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
158 we maintain a `curl_config-win32.h` file in lib directory that is supposed to
159 look exactly like a `curl_config.h` file would have looked like on a Windows
160 machine!
161
162 Generally speaking: always remember that this will be compiled on dozens of
163 operating systems. Don't walk on the edge!
164
165<a name="Library"></a>
166Library
167=======
168
169 (See [Structs in libcurl](#structs) for the separate section describing all
170 major internal structs and their purposes.)
171
172 There are plenty of entry points to the library, namely each publicly defined
173 function that libcurl offers to applications. All of those functions are
174 rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
175 put in the lib/easy.c file.
176
177 `curl_global_init()` and `curl_global_cleanup()` should be called by the
178 application to initialize and clean up global stuff in the library. As of
179 today, it can handle the global SSL initing if SSL is enabled and it can init
180 the socket layer on windows machines. libcurl itself has no "global" scope.
181
182 All printf()-style functions use the supplied clones in lib/mprintf.c. This
183 makes sure we stay absolutely platform independent.
184
185 [ `curl_easy_init()`][2] allocates an internal struct and makes some
186 initializations.  The returned handle does not reveal internals. This is the
187 `Curl_easy` struct which works as an "anchor" struct for all `curl_easy`
188 functions. All connections performed will get connect-specific data allocated
189 that should be used for things related to particular connections/requests.
190
191 [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
192 be passed in pairs: the parameter-ID and the parameter-value. The list of
193 options is documented in the man page. This function mainly sets things in
194 the `Curl_easy` struct.
195
196 `curl_easy_perform()` is just a wrapper function that makes use of the multi
197 API.  It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
198 `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
199 and then returns.
200
201 Some of the most important key functions in url.c are called from multi.c
202 when certain key steps are to be made in the transfer operation.
203
204<a name="Curl_connect"></a>
205Curl_connect()
206--------------
207
208   Analyzes the URL, it separates the different components and connects to the
209   remote host. This may involve using a proxy and/or using SSL. The
210   `Curl_resolv()` function in lib/hostip.c is used for looking up host names
211   (it does then use the proper underlying method, which may vary between
212   platforms and builds).
213
214   When `Curl_connect` is done, we are connected to the remote site. Then it
215   is time to tell the server to get a document/file. `Curl_do()` arranges
216   this.
217
218   This function makes sure there's an allocated and initiated 'connectdata'
219   struct that is used for this particular connection only (although there may
220   be several requests performed on the same connect). A bunch of things are
221   inited/inherited from the `Curl_easy` struct.
222
223<a name="Curl_do"></a>
224Curl_do()
225---------
226
227   `Curl_do()` makes sure the proper protocol-specific function is called. The
228   functions are named after the protocols they handle.
229
230   The protocol-specific functions of course deal with protocol-specific
231   negotiations and setup. They have access to the `Curl_sendf()` (from
232   lib/sendf.c) function to send printf-style formatted data to the remote
233   host and when they're ready to make the actual file transfer they call the
234   `Curl_Transfer()` function (in lib/transfer.c) to setup the transfer and
235   returns.
236
237   If this DO function fails and the connection is being re-used, libcurl will
238   then close this connection, setup a new connection and re-issue the DO
239   request on that. This is because there is no way to be perfectly sure that
240   we have discovered a dead connection before the DO function and thus we
241   might wrongly be re-using a connection that was closed by the remote peer.
242
243   Some time during the DO function, the `Curl_setup_transfer()` function must
244   be called with some basic info about the upcoming transfer: what socket(s)
245   to read/write and the expected file transfer sizes (if known).
246
247<a name="Curl_readwrite"></a>
248Curl_readwrite()
249----------------
250
251   Called during the transfer of the actual protocol payload.
252
253   During transfer, the progress functions in lib/progress.c are called at
254   frequent intervals (or at the user's choice, a specified callback might get
255   called). The speedcheck functions in lib/speedcheck.c are also used to
256   verify that the transfer is as fast as required.
257
258<a name="Curl_done"></a>
259Curl_done()
260-----------
261
262   Called after a transfer is done. This function takes care of everything
263   that has to be done after a transfer. This function attempts to leave
264   matters in a state so that `Curl_do()` should be possible to call again on
265   the same connection (in a persistent connection case). It might also soon
266   be closed with `Curl_disconnect()`.
267
268<a name="Curl_disconnect"></a>
269Curl_disconnect()
270-----------------
271
272   When doing normal connections and transfers, no one ever tries to close any
273   connections so this is not normally called when `curl_easy_perform()` is
274   used. This function is only used when we are certain that no more transfers
275   are going to be made on the connection. It can be also closed by force, or
276   it can be called to make sure that libcurl doesn't keep too many
277   connections alive at the same time.
278
279   This function cleans up all resources that are associated with a single
280   connection.
281
282<a name="http"></a>
283HTTP(S)
284=======
285
286 HTTP offers a lot and is the protocol in curl that uses the most lines of
287 code. There is a special file (lib/formdata.c) that offers all the multipart
288 post functions.
289
290 base64-functions for user+password stuff (and more) is in (lib/base64.c) and
291 all functions for parsing and sending cookies are found in (lib/cookie.c).
292
293 HTTPS uses in almost every case the same procedure as HTTP, with only two
294 exceptions: the connect procedure is different and the function used to read
295 or write from the socket is different, although the latter fact is hidden in
296 the source by the use of `Curl_read()` for reading and `Curl_write()` for
297 writing data to the remote server.
298
299 `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
300 encoding.
301
302 An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
303 series of functions we use. They append data to one single buffer, and when
304 the building is finished the entire request is sent off in one single write. This is done this way to overcome problems with flawed firewalls and lame servers.
305
306<a name="ftp"></a>
307FTP
308===
309
310 The `Curl_if2ip()` function can be used for getting the IP number of a
311 specified network interface, and it resides in lib/if2ip.c.
312
313 `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
314 was made a separate function to prevent us programmers from forgetting that
315 they must be CRLF terminated. They must also be sent in one single write() to
316 make firewalls and similar happy.
317
318<a name="kerberos"></a>
319Kerberos
320--------
321
322 Kerberos support is mainly in lib/krb5.c and lib/security.c but also
323 `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
324 `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
325
326<a name="telnet"></a>
327TELNET
328======
329
330 Telnet is implemented in lib/telnet.c.
331
332<a name="file"></a>
333FILE
334====
335
336 The file:// protocol is dealt with in lib/file.c.
337
338<a name="smb"></a>
339SMB
340===
341
342 The smb:// protocol is dealt with in lib/smb.c.
343
344<a name="ldap"></a>
345LDAP
346====
347
348 Everything LDAP is in lib/ldap.c and lib/openldap.c
349
350<a name="email"></a>
351E-mail
352======
353
354 The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c.
355
356<a name="general"></a>
357General
358=======
359
360 URL encoding and decoding, called escaping and unescaping in the source code,
361 is found in lib/escape.c.
362
363 While transferring data in Transfer() a few functions might get used.
364 `curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more).
365
366 lib/getenv.c offers `curl_getenv()` which is for reading environment
367 variables in a neat platform independent way. That's used in the client, but
368 also in lib/url.c when checking the proxy environment variables. Note that
369 contrary to the normal unix getenv(), this returns an allocated buffer that
370 must be free()ed after use.
371
372 lib/netrc.c holds the .netrc parser
373
374 lib/timeval.c features replacement functions for systems that don't have
375 gettimeofday() and a few support functions for timeval conversions.
376
377 A function named `curl_version()` that returns the full curl version string
378 is found in lib/version.c.
379
380<a name="persistent"></a>
381Persistent Connections
382======================
383
384 The persistent connection support in libcurl requires some considerations on
385 how to do things inside of the library.
386
387 - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call
388   must never hold connection-oriented data. It is meant to hold the root data
389   as well as all the options etc that the library-user may choose.
390
391 - The `Curl_easy` struct holds the "connection cache" (an array of
392   pointers to 'connectdata' structs).
393
394 - This enables the 'curl handle' to be reused on subsequent transfers.
395
396 - When libcurl is told to perform a transfer, it first checks for an already
397   existing connection in the cache that we can use. Otherwise it creates a
398   new one and adds that to the cache. If the cache is full already when a new
399   connection is added, it will first close the oldest unused one.
400
401 - When the transfer operation is complete, the connection is left
402   open. Particular options may tell libcurl not to, and protocols may signal
403   closure on connections and then they won't be kept open, of course.
404
405 - When `curl_easy_cleanup()` is called, we close all still opened connections,
406   unless of course the multi interface "owns" the connections.
407
408 The curl handle must be re-used in order for the persistent connections to
409 work.
410
411<a name="multi"></a>
412multi interface/non-blocking
413============================
414
415 The multi interface is a non-blocking interface to the library. To make that
416 interface work as well as possible, no low-level functions within libcurl
417 must be written to work in a blocking manner. (There are still a few spots
418 violating this rule.)
419
420 One of the primary reasons we introduced c-ares support was to allow the name
421 resolve phase to be perfectly non-blocking as well.
422
423 The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
424 the code to allow non-blocking operations even on multi-stage command-
425 response protocols. They are built around state machines that return when
426 they would otherwise block waiting for data.  The DICT, LDAP and TELNET
427 protocols are crappy examples and they are subject for rewrite in the future
428 to better fit the libcurl protocol family.
429
430<a name="ssl"></a>
431SSL libraries
432=============
433
434 Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
435 extended to its successor OpenSSL but has since also been extended to several
436 other SSL/TLS libraries and we expect and hope to further extend the support
437 in future libcurl versions.
438
439 To deal with this internally in the best way possible, we have a generic SSL
440 function API as provided by the vtls/vtls.[ch] system, and they are the only
441 SSL functions we must use from within libcurl. vtls is then crafted to use
442 the appropriate lower-level function calls to whatever SSL library that is in
443 use. For example vtls/openssl.[ch] for the OpenSSL library.
444
445<a name="symbols"></a>
446Library Symbols
447===============
448
449 All symbols used internally in libcurl must use a `Curl_` prefix if they're
450 used in more than a single file. Single-file symbols must be made static.
451 Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
452 but they are to be changed to follow this pattern in future versions.) Public
453 API functions are marked with `CURL_EXTERN` in the public header files so
454 that all others can be hidden on platforms where this is possible.
455
456<a name="returncodes"></a>
457Return Codes and Informationals
458===============================
459
460 I've made things simple. Almost every function in libcurl returns a CURLcode,
461 that must be `CURLE_OK` if everything is OK or otherwise a suitable error
462 code as the curl/curl.h include file defines. The very spot that detects an
463 error must use the `Curl_failf()` function to set the human-readable error
464 description.
465
466 In aiding the user to understand what's happening and to debug curl usage, we
467 must supply a fair number of informational messages by using the
468 `Curl_infof()` function. Those messages are only displayed when the user
469 explicitly asks for them. They are best used when revealing information that
470 isn't otherwise obvious.
471
472<a name="abi"></a>
473API/ABI
474=======
475
476 We make an effort to not export or show internals or how internals work, as
477 that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
478 for our promise to users.
479
480<a name="client"></a>
481Client
482======
483
484 main() resides in `src/tool_main.c`.
485
486 `src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script
487 to display the complete "manual" and the `src/tool_urlglob.c` file holds the
488 functions used for the URL-"globbing" support. Globbing in the sense that the
489 {} and [] expansion stuff is there.
490
491 The client mostly sets up its 'config' struct properly, then
492 it calls the `curl_easy_*()` functions of the library and when it gets back
493 control after the `curl_easy_perform()` it cleans up the library, checks
494 status and exits.
495
496 When the operation is done, the ourWriteOut() function in src/writeout.c may
497 be called to report about the operation. That function is using the
498 `curl_easy_getinfo()` function to extract useful information from the curl
499 session.
500
501 It may loop and do all this several times if many URLs were specified on the
502 command line or config file.
503
504<a name="memorydebug"></a>
505Memory Debugging
506================
507
508 The file lib/memdebug.c contains debug-versions of a few functions. Functions
509 such as malloc, free, fopen, fclose, etc that somehow deal with resources
510 that might give us problems if we "leak" them. The functions in the memdebug
511 system do nothing fancy, they do their normal function and then log
512 information about what they just did. The logged data can then be analyzed
513 after a complete session,
514
515 memanalyze.pl is the perl script present in tests/ that analyzes a log file
516 generated by the memory tracking system. It detects if resources are
517 allocated but never freed and other kinds of errors related to resource
518 management.
519
520 Internally, definition of preprocessor symbol DEBUGBUILD restricts code which
521 is only compiled for debug enabled builds. And symbol CURLDEBUG is used to
522 differentiate code which is _only_ used for memory tracking/debugging.
523
524 Use -DCURLDEBUG when compiling to enable memory debugging, this is also
525 switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD
526 when compiling to enable a debug build or run configure with --enable-debug.
527
528 curl --version will list 'Debug' feature for debug enabled builds, and
529 will list 'TrackMemory' feature for curl debug memory tracking capable
530 builds. These features are independent and can be controlled when running
531 the configure script. When --enable-debug is given both features will be
532 enabled, unless some restriction prevents memory tracking from being used.
533
534<a name="test"></a>
535Test Suite
536==========
537
538 The test suite is placed in its own subdirectory directly off the root in the
539 curl archive tree, and it contains a bunch of scripts and a lot of test case
540 data.
541
542 The main test script is runtests.pl that will invoke test servers like
543 httpserver.pl and ftpserver.pl before all the test cases are performed. The
544 test suite currently only runs on Unix-like platforms.
545
546 You'll find a description of the test suite in the tests/README file, and the
547 test case data files in the tests/FILEFORMAT file.
548
549 The test suite automatically detects if curl was built with the memory
550 debugging enabled, and if it was, it will detect memory leaks, too.
551
552<a name="asyncdns"></a>
553Asynchronous name resolves
554==========================
555
556 libcurl can be built to do name resolves asynchronously, using either the
557 normal resolver in a threaded manner or by using c-ares.
558
559<a name="cares"></a>
560[c-ares][3]
561------
562
563### Build libcurl to use a c-ares
564
5651. ./configure --enable-ares=/path/to/ares/install
5662. make
567
568### c-ares on win32
569
570 First I compiled c-ares. I changed the default C runtime library to be the
571 single-threaded rather than the multi-threaded (this seems to be required to
572 prevent linking errors later on). Then I simply build the areslib project
573 (the other projects adig/ahost seem to fail under MSVC).
574
575 Next was libcurl. I opened lib/config-win32.h and I added a:
576 `#define USE_ARES 1`
577
578 Next thing I did was I added the path for the ares includes to the include
579 path, and the libares.lib to the libraries.
580
581 Lastly, I also changed libcurl to be single-threaded rather than
582 multi-threaded, again this was to prevent some duplicate symbol errors. I'm
583 not sure why I needed to change everything to single-threaded, but when I
584 didn't I got redefinition errors for several CRT functions (malloc, stricmp,
585 etc.)
586
587<a name="curl_off_t"></a>
588`curl_off_t`
589==========
590
591 `curl_off_t` is a data type provided by the external libcurl include
592 headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
593 options that end with LARGE. The type is 64bit large on most modern
594 platforms.
595
596curlx
597=====
598
599 The libcurl source code offers a few functions by source only. They are not
600 part of the official libcurl API, but the source files might be useful for
601 others so apps can optionally compile/build with these sources to gain
602 additional functions.
603
604 We provide them through a single header file for easy access for apps:
605 "curlx.h"
606
607`curlx_strtoofft()`
608-------------------
609   A macro that converts a string containing a number to a `curl_off_t` number.
610   This might use the `curlx_strtoll()` function which is provided as source
611   code in strtoofft.c. Note that the function is only provided if no
612   strtoll() (or equivalent) function exist on your platform. If `curl_off_t`
613   is only a 32 bit number on your platform, this macro uses strtol().
614
615Future
616------
617
618 Several functions will be removed from the public `curl_` name space in a
619 future libcurl release. They will then only become available as `curlx_`
620 functions instead. To make the transition easier, we already today provide
621 these functions with the `curlx_` prefix to allow sources to be built
622 properly with the new function names. The concerned functions are:
623
624 - `curlx_getenv`
625 - `curlx_strequal`
626 - `curlx_strnequal`
627 - `curlx_mvsnprintf`
628 - `curlx_msnprintf`
629 - `curlx_maprintf`
630 - `curlx_mvaprintf`
631 - `curlx_msprintf`
632 - `curlx_mprintf`
633 - `curlx_mfprintf`
634 - `curlx_mvsprintf`
635 - `curlx_mvprintf`
636 - `curlx_mvfprintf`
637
638<a name="contentencoding"></a>
639Content Encoding
640================
641
642## About content encodings
643
644 [HTTP/1.1][4] specifies that a client may request that a server encode its
645 response. This is usually used to compress a response using one (or more)
646 encodings from a set of commonly available compression techniques. These
647 schemes include 'deflate' (the zlib algorithm), 'gzip' 'br' (brotli) and
648 'compress'. A client requests that the server perform an encoding by including
649 an Accept-Encoding header in the request document. The value of the header
650 should be one of the recognized tokens 'deflate', ... (there's a way to
651 register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor
652 the client's encoding request. When a response is encoded, the server
653 includes a Content-Encoding header in the response. The value of the
654 Content-Encoding header indicates which encodings were used to encode the
655 data, in the order in which they were applied.
656
657 It's also possible for a client to attach priorities to different schemes so
658 that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
659 information on the Accept-Encoding header. See sec [3.1.2.2 of RFC 7231][15]
660 for more information on the Content-Encoding header.
661
662## Supported content encodings
663
664 The 'deflate', 'gzip' and 'br' content encodings are supported by libcurl.
665 Both regular and chunked transfers work fine.  The zlib library is required
666 for the 'deflate' and 'gzip' encodings, while the brotli decoding library is
667 for the 'br' encoding.
668
669## The libcurl interface
670
671 To cause libcurl to request a content encoding use:
672
673  [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
674
675 where string is the intended value of the Accept-Encoding header.
676
677 Currently, libcurl does support multiple encodings but only
678 understands how to process responses that use the "deflate", "gzip" and/or
679 "br" content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5]
680 that will work (besides "identity," which does nothing) are "deflate",
681 "gzip" and "br". If a response is encoded using the "compress" or methods,
682 libcurl will return an error indicating that the response could
683 not be decoded.  If <string> is NULL no Accept-Encoding header is generated.
684 If <string> is a zero-length string, then an Accept-Encoding header
685 containing all supported encodings will be generated.
686
687 The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
688 content to be automatically decoded.  If it is not set and the server still
689 sends encoded content (despite not having been asked), the data is returned
690 in its raw form and the Content-Encoding type is not checked.
691
692## The curl interface
693
694 Use the [--compressed][6] option with curl to cause it to ask servers to
695 compress responses using any format supported by curl.
696
697<a name="hostip"></a>
698hostip.c explained
699==================
700
701 The main compile-time defines to keep in mind when reading the host*.c source
702 file are these:
703
704## `CURLRES_IPV6`
705
706 this host has getaddrinfo() and family, and thus we use that. The host may
707 not be able to resolve IPv6, but we don't really have to take that into
708 account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined.
709
710## `CURLRES_ARES`
711
712 is defined if libcurl is built to use c-ares for asynchronous name
713 resolves. This can be Windows or *nix.
714
715## `CURLRES_THREADED`
716
717 is defined if libcurl is built to use threading for asynchronous name
718 resolves. The name resolve will be done in a new thread, and the supported
719 asynch API will be the same as for ares-builds. This is the default under
720 (native) Windows.
721
722 If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
723 libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
724 defined.
725
726## host*.c sources
727
728 The host*.c sources files are split up like this:
729
730 - hostip.c      - method-independent resolver functions and utility functions
731 - hostasyn.c    - functions for asynchronous name resolves
732 - hostsyn.c     - functions for synchronous name resolves
733 - asyn-ares.c   - functions for asynchronous name resolves using c-ares
734 - asyn-thread.c - functions for asynchronous name resolves using threads
735 - hostip4.c     - IPv4 specific functions
736 - hostip6.c     - IPv6 specific functions
737
738 The hostip.h is the single united header file for all this. It defines the
739 `CURLRES_*` defines based on the config*.h and `curl_setup.h` defines.
740
741<a name="memoryleak"></a>
742Track Down Memory Leaks
743=======================
744
745## Single-threaded
746
747  Please note that this memory leak system is not adjusted to work in more
748  than one thread. If you want/need to use it in a multi-threaded app. Please
749  adjust accordingly.
750
751
752## Build
753
754  Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with
755  --enable-debug fixes this). 'make clean' first, then 'make' so that all
756  files are actually rebuilt properly. It will also make sense to build
757  libcurl with the debug option (usually -g to the compiler) so that debugging
758  it will be easier if you actually do find a leak in the library.
759
760  This will create a library that has memory debugging enabled.
761
762## Modify Your Application
763
764  Add a line in your application code:
765
766       `curl_memdebug("dump");`
767
768  This will make the malloc debug system output a full trace of all resource
769  using functions to the given file name. Make sure you rebuild your program
770  and that you link with the same libcurl you built for this purpose as
771  described above.
772
773## Run Your Application
774
775  Run your program as usual. Watch the specified memory trace file grow.
776
777  Make your program exit and use the proper libcurl cleanup functions etc. So
778  that all non-leaks are returned/freed properly.
779
780## Analyze the Flow
781
782  Use the tests/memanalyze.pl perl script to analyze the dump file:
783
784    tests/memanalyze.pl dump
785
786  This now outputs a report on what resources that were allocated but never
787  freed etc. This report is very fine for posting to the list!
788
789  If this doesn't produce any output, no leak was detected in libcurl. Then
790  the leak is mostly likely to be in your code.
791
792<a name="multi_socket"></a>
793`multi_socket`
794==============
795
796 Implementation of the `curl_multi_socket` API
797
798  The main ideas of this API are simply:
799
800   1 - The application can use whatever event system it likes as it gets info
801       from libcurl about what file descriptors libcurl waits for what action
802       on. (The previous API returns `fd_sets` which is very select()-centric).
803
804   2 - When the application discovers action on a single socket, it calls
805       libcurl and informs that there was action on this particular socket and
806       libcurl can then act on that socket/transfer only and not care about
807       any other transfers. (The previous API always had to scan through all
808       the existing transfers.)
809
810  The idea is that [`curl_multi_socket_action()`][7] calls a given callback
811  with information about what socket to wait for what action on, and the
812  callback only gets called if the status of that socket has changed.
813
814  We also added a timer callback that makes libcurl call the application when
815  the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
816  and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
817  Internally, there's an added struct to each easy handle in which we store
818  an "expire time" (if any). The structs are then "splay sorted" so that we
819  can add and remove times from the linked list and yet somewhat swiftly
820  figure out both how long there is until the next nearest timer expires
821  and which timer (handle) we should take care of now. Of course, the upside
822  of all this is that we get a [`curl_multi_timeout()`][8] that should also
823  work with old-style applications that use [`curl_multi_perform()`][11].
824
825  We created an internal "socket to easy handles" hash table that given
826  a socket (file descriptor) returns the easy handle that waits for action on
827  that socket.  This hash is made using the already existing hash code
828  (previously only used for the DNS cache).
829
830  To make libcurl able to report plain sockets in the socket callback, we had
831  to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
832  the conversion from sockets to `fd_sets` for that function is only done in
833  the last step before the data is returned. I also had to extend c-ares to
834  get a function that can return plain sockets, as that library too returned
835  only `fd_sets` and that is no longer good enough. The changes done to c-ares
836  are available in c-ares 1.3.1 and later.
837
838<a name="structs"></a>
839Structs in libcurl
840==================
841
842This section should cover 7.32.0 pretty accurately, but will make sense even
843for older and later versions as things don't change drastically that often.
844
845## Curl_easy
846
847  The `Curl_easy` struct is the one returned to the outside in the external API
848  as a "CURL *". This is usually known as an easy handle in API documentations
849  and examples.
850
851  Information and state that is related to the actual connection is in the
852  'connectdata' struct. When a transfer is about to be made, libcurl will
853  either create a new connection or re-use an existing one. The particular
854  connectdata that is used by this handle is pointed out by
855  `Curl_easy->easy_conn`.
856
857  Data and information that regard this particular single transfer is put in
858  the SingleRequest sub-struct.
859
860  When the `Curl_easy` struct is added to a multi handle, as it must be in
861  order to do any transfer, the ->multi member will point to the `Curl_multi`
862  struct it belongs to. The ->prev and ->next members will then be used by the
863  multi code to keep a linked list of `Curl_easy` structs that are added to
864  that same multi handle. libcurl always uses multi so ->multi *will* point to
865  a `Curl_multi` when a transfer is in progress.
866
867  ->mstate is the multi state of this particular `Curl_easy`. When
868  `multi_runsingle()` is called, it will act on this handle according to which
869  state it is in. The mstate is also what tells which sockets to return for a
870  specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc.
871
872  The libcurl source code generally use the name 'data' for the variable that
873  points to the `Curl_easy`.
874
875  When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with
876  an individual stream, sharing the same connectdata struct. Multiplexing
877  makes it even more important to keep things associated with the right thing!
878
879## connectdata
880
881  A general idea in libcurl is to keep connections around in a connection
882  "cache" after they have been used in case they will be used again and then
883  re-use an existing one instead of creating a new as it creates a significant
884  performance boost.
885
886  Each 'connectdata' identifies a single physical connection to a server. If
887  the connection can't be kept alive, the connection will be closed after use
888  and then this struct can be removed from the cache and freed.
889
890  Thus, the same `Curl_easy` can be used multiple times and each time select
891  another connectdata struct to use for the connection. Keep this in mind, as
892  it is then important to consider if options or choices are based on the
893  connection or the `Curl_easy`.
894
895  Functions in libcurl will assume that connectdata->data points to the
896  `Curl_easy` that uses this connection (for the moment).
897
898  As a special complexity, some protocols supported by libcurl require a
899  special disconnect procedure that is more than just shutting down the
900  socket. It can involve sending one or more commands to the server before
901  doing so. Since connections are kept in the connection cache after use, the
902  original `Curl_easy` may no longer be around when the time comes to shut down
903  a particular connection. For this purpose, libcurl holds a special dummy
904  `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed.
905
906  FTP uses two TCP connections for a typical transfer but it keeps both in
907  this single struct and thus can be considered a single connection for most
908  internal concerns.
909
910  The libcurl source code generally use the name 'conn' for the variable that
911  points to the connectdata.
912
913## Curl_multi
914
915  Internally, the easy interface is implemented as a wrapper around multi
916  interface functions. This makes everything multi interface.
917
918  `Curl_multi` is the multi handle struct exposed as "CURLM *" in external
919  APIs.
920
921  This struct holds a list of `Curl_easy` structs that have been added to this
922  handle with [`curl_multi_add_handle()`][13]. The start of the list is
923  `->easyp` and `->num_easy` is a counter of added `Curl_easy`s.
924
925  `->msglist` is a linked list of messages to send back when
926  [`curl_multi_info_read()`][14] is called. Basically a node is added to that
927  list when an individual `Curl_easy`'s transfer has completed.
928
929  `->hostcache` points to the name cache. It is a hash table for looking up
930  name to IP. The nodes have a limited life time in there and this cache is
931  meant to reduce the time for when the same name is wanted within a short
932  period of time.
933
934  `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time
935  until it should be checked - normally some sort of timeout. Each `Curl_easy`
936  has one node in the tree.
937
938  `->sockhash` is a hash table to allow fast lookups of socket descriptor for
939  which `Curl_easy` uses that descriptor. This is necessary for the
940  `multi_socket` API.
941
942  `->conn_cache` points to the connection cache. It keeps track of all
943  connections that are kept after use. The cache has a maximum size.
944
945  `->closure_handle` is described in the 'connectdata' section.
946
947  The libcurl source code generally use the name 'multi' for the variable that
948  points to the `Curl_multi` struct.
949
950## Curl_handler
951
952  Each unique protocol that is supported by libcurl needs to provide at least
953  one `Curl_handler` struct. It defines what the protocol is called and what
954  functions the main code should call to deal with protocol specific issues.
955  In general, there's a source file named [protocol].c in which there's a
956  "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's
957  then the main array with all individual `Curl_handler` structs pointed to
958  from a single array which is scanned through when a URL is given to libcurl
959  to work with.
960
961  `->scheme` is the URL scheme name, usually spelled out in uppercase. That's
962  "HTTP" or "FTP" etc. SSL versions of the protocol need their own `Curl_handler` setup so HTTPS separate from HTTP.
963
964  `->setup_connection` is called to allow the protocol code to allocate
965  protocol specific data that then gets associated with that `Curl_easy` for
966  the rest of this transfer. It gets freed again at the end of the transfer.
967  It will be called before the 'connectdata' for the transfer has been
968  selected/created. Most protocols will allocate its private
969  'struct [PROTOCOL]' here and assign `Curl_easy->req.protop` to point to it.
970
971  `->connect_it` allows a protocol to do some specific actions after the TCP
972  connect is done, that can still be considered part of the connection phase.
973
974  Some protocols will alter the `connectdata->recv[]` and
975  `connectdata->send[]` function pointers in this function.
976
977  `->connecting` is similarly a function that keeps getting called as long as
978  the protocol considers itself still in the connecting phase.
979
980  `->do_it` is the function called to issue the transfer request. What we call
981  the DO action internally. If the DO is not enough and things need to be kept
982  getting done for the entire DO sequence to complete, `->doing` is then
983  usually also provided. Each protocol that needs to do multiple commands or
984  similar for do/doing need to implement their own state machines (see SCP,
985  SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has
986  a separate piece of the DO state called `DO_MORE`.
987
988  `->doing` keeps getting called while issuing the transfer request command(s)
989
990  `->done` gets called when the transfer is complete and DONE. That's after the
991  main data has been transferred.
992
993  `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses
994  this state when setting up the second connection.
995
996  ->`proto_getsock`
997  ->`doing_getsock`
998  ->`domore_getsock`
999  ->`perform_getsock`
1000  Functions that return socket information. Which socket(s) to wait for which
1001  action(s) during the particular multi state.
1002
1003  ->disconnect is called immediately before the TCP connection is shutdown.
1004
1005  ->readwrite gets called during transfer to allow the protocol to do extra
1006  reads/writes
1007
1008  ->defport is the default report TCP or UDP port this protocol uses
1009
1010  ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions
1011  have their "base" protocol set and then the SSL variation. Like
1012  "HTTP|HTTPS".
1013
1014  ->flags is a bitmask with additional information about the protocol that will
1015  make it get treated differently by the generic engine:
1016
1017  - `PROTOPT_SSL` - will make it connect and negotiate SSL
1018
1019  - `PROTOPT_DUAL` - this protocol uses two connections
1020
1021  - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
1022    connection. This flag is no longer used by code, yet still set for a bunch
1023    of protocol handlers.
1024
1025  - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
1026    limit which "direction" of socket actions that the main engine will
1027    concern itself with.
1028
1029  - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:)
1030
1031  - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
1032    one unless one is provided
1033
1034  - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
1035    (?foo=bar)
1036
1037## conncache
1038
1039  Is a hash table with connections for later re-use. Each `Curl_easy` has a
1040  pointer to its connection cache. Each multi handle sets up a connection
1041  cache that all added `Curl_easy`s share by default.
1042
1043## Curl_share
1044
1045  The libcurl share API allocates a `Curl_share` struct, exposed to the
1046  external API as "CURLSH *".
1047
1048  The idea is that the struct can have a set of its own versions of caches and
1049  pools and then by providing this struct in the `CURLOPT_SHARE` option, those
1050  specific `Curl_easy`s will use the caches/pools that this share handle
1051  holds.
1052
1053  Then individual `Curl_easy` structs can be made to share specific things
1054  that they otherwise wouldn't, such as cookies.
1055
1056  The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
1057  session cache.
1058
1059## CookieInfo
1060
1061  This is the main cookie struct. It holds all known cookies and related
1062  information. Each `Curl_easy` has its own private CookieInfo even when
1063  they are added to a multi handle. They can be made to share cookies by using
1064  the share API.
1065
1066
1067[1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
1068[2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
1069[3]: https://c-ares.haxx.se/
1070[4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
1071[5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
1072[6]: https://curl.haxx.se/docs/manpage.html#--compressed
1073[7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
1074[8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
1075[9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
1076[10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
1077[11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
1078[12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
1079[13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
1080[14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html
1081[15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2
1082