GSoC with GNU Wget2 - Part II
This is story about my second journey in the Google Summer of Code (GSoC) 2017.
From the previous phase, I made some basic progress. I also got many unresolved
tasks to be done. Started on this week, I try to follow Darshit Shah advice to
increase my report from weekly to daily. However, in reality, it was difficult
to create daily report, there must be day that I report nothing because what I
do was actually stuck in my problem.
What I have got so far:
Started working on wget_test_start_server()
to use Libmicrohttpd as HTTP
server. Workflow to resolve this:
- Remove initial process for HTTP server socket
- Create
_http_server_start()
function, wrapper for Libmicrohttpd. There is also functionanswer_to_connection()
which use to create proper HTTP response - Use select method (
MHD_USE_SELECT_INTERNALLY
) for threading model in Libmicrohttpd to get better compatibility http_server_port
seized automatically using Libmicrohttpd function by passingMHD_DAEMON_INFO_BIND_PORT
orMHD_DAEMON_INFO_LISTEN_FD
parameter toMHD_get_daemon_info()
- Using iteration to parse urls data in
answer_to_connection()
. This guarantee we can pass any variadic data to Libmicrohttpd and prevent segmentation fault - Fix
answer_to_connection()
function to create proper HTTP response:- Handle arguments (query string) on URL
- Handle redirection
- Handle chunked transfer encoding
- Handle directory creation
- Handle URL with IRI object
- Handle URL with IDN hostname
- Handle URL with query string which contains space
- Handle If-Modified-Since header
- Fix segmentation fault error when build on Fedora/Clang CI runner. This
caused by former variable
http_server_tid
that forget to removed
Things which would be done in the upcoming week:
- There are still problem occurred when running
make check-valgrind
on some tests. This prevent me to pass the CI test - Fix all remaining unfinished test covered by Libmicrohttpd code:
- test-wget1.c: add capability for HTTP 406 Range Not Satisfiable
- test-auth-basic.c: add basic authentication test
- test-metalink.c: looping
- test-i-https.c: add HTTPS test
- Fix Libmicrohttpd depedency on Wget2, just required when need to run test suite
5th Week
I started my week with ask a question into my mentors. I try to resolve the check-valgrind problems. What I do so far, I split tests into two groups: pass and fail. I pick one from the fail group, take one from example, test-directory-clash. From the valgrind log it shows:
==30425== ERROR SUMMARY: 24 errors from 2 contexts (suppressed: 0 from 0)
==30425==
==30425== 4 errors in context 1 of 2:
==30425== Thread 2:
==30425== Invalid read of size 1
==30425== at 0x556F570: __strcmp_sse2_unaligned (strcmp-sse2-unaligned.S:24)
==30425== by 0x4E6D1C2: wget_strcmp (utils.c:82)
==30425== by 0x40CF61: try_connection (wget.c:1088)
==30425== by 0x40D1E6: establish_connection (wget.c:1151)
==30425== by 0x40EA0D: downloader_thread (wget.c:1626)
==30425== by 0x52BA6B9: start_thread (pthread_create.c:333)
==30425== by 0x55D73DC: clone (clone.S:109)
==30425== Address 0xa716033 is 163 bytes inside a block of size 180 free'd
==30425== at 0x4C2EDEB: free (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30425== by 0x4E5EDD0: wget_iri_free (iri.c:178)
==30425== by 0x408267: host_remove_job (host.c:347)
==30425== by 0x40EC4E: downloader_thread (wget.c:1698)
==30425== by 0x52BA6B9: start_thread (pthread_create.c:333)
==30425== by 0x55D73DC: clone (clone.S:109)
==30425== Block was alloc'd at
==30425== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30425== by 0x4E6F2C2: wget_malloc (xalloc.c:85)
==30425== by 0x4E5EE69: wget_iri_parse (iri.c:231)
==30425== by 0x4E601AC: wget_iri_parse_base (iri.c:623)
==30425== by 0x407FC6: host_add_robotstxt_job (host.c:288)
==30425== by 0x40ACE0: add_url_to_queue (wget.c:438)
==30425== by 0x40C273: main (wget.c:853)
I try to modify the test, adding option –no-robots. The result, valgrind check exited normally. Therefore, I make a conclusion and guest that the robots.txt processor on Wget2 and my Libmicrohttpd code in libtest.c trigger memory leak. I have no idea about this. I can just guest that this is related to libgcrypt error previously said by Christian Grothoff. Although I still found that libgcrypt error in passed test.
Evgeny Grin gives a review about my works, he direct pointed to the code I attached for review purpose.
static char *_scan_directory(const char* data)
{
char *path = strchr(data, '/');
if (path != 0) {
return path;
}
else
return NULL;
}
He asks why do I use underscore as prefix for functions. Usually it is used by libraries to avoid name conflict. First, I think that I need to remove the underscore, then Tim Ruehsen also give comment that this is also useful in projects where there is more than one C file. They use it to make clear that a function/variable is static (not consequently everywhere, though). So, I leaved the underscore as is.
static char *_parse_hostname(const char* data)
{
if (!wget_strncasecmp_ascii(data, "http://", 7)) {
char *path = strchr(data += 7, '/');
return path;
} else
return NULL;
}
static char *_replace_space_with_plus(char *data)
{
if (strchr(data, ' ') != 0) {
char *result = data;
char *wk, *s;
wk = s = strdup(data);
while (*s != 0) {
if (*s == ' '){
*data++ = '+';
++s;
} else
*data++ = *s++;
}
*data = '\0';
free(wk);
return result;
} else
return data;
}
He asks me about the reason for using strdup()
/free()
. I checked string
twice (by strchr()
and by my custom iterations). It just waste of CPU time.
He gives simpler implementation:
{
while(0 != *data)
{
if (' ' == *data)
*data = '+';
data++;
}
return data;
}
I applied the changes to my code.
Evgeny also give note that according to current HTTP specification ‘+’ must NOT
be used as replacement for ‘ ’ (space) in URLs. When I learn about this problem, it
leads me to here [0]. Then if it need to be applied, I need to modify test file:
test-base.c to not use ‘+’, and use %2B instead.
Tim added that the ‘+’ was always just for the query part. He asks to Evgeny
what document are you exactly referring to to. Not that he is against dropping
the ‘+’ rule, but what consortium is not accepted as normative by everyone,
while IETF is. He is unsure about what ‘spec’ to follow. So, I keep my changes
of how I treat the ‘ ’ (space).
static int print_out_key(void *cls, enum MHD_ValueKind kind, const char *key,
const char *value)
{
if (key && url_it == 0 && url_it2 == 0) {
wget_buffer_strcpy(url_arg, "?");
_replace_space_with_plus((char *) key);
Evgeny said that we are not allowed to modify any content pointed by pointer to const. By dropping ‘const’ qualifiers are violating API. In other words: I was modifying internal structures that are not expected to be modified and the result is unpredictable.
wget_buffer_strcat(url_arg, key);
if (value) {
wget_buffer_strcat(url_arg, "=");
_replace_space_with_plus((char *) value);
wget_buffer_strcat(url_arg, value);
}
}
if (key && url_it != 0 && url_it2 == 0) {
wget_buffer_strcat(url_arg, "&");
_replace_space_with_plus((char *) key);
wget_buffer_strcat(url_arg, key);
if (value) {
wget_buffer_strcat(url_arg, "=");
_replace_space_with_plus((char *) value);
wget_buffer_strcat(url_arg, value);
}
}
url_it++;
return MHD_YES;
}
I fixed them then.
He also gives me some questions:
- Is
url_arg
a global variable? - Do I use single thread only?
- Why not to pass pointer to
url_arg
ascls
? - What about
url_it
andurl_it2
? It is hard to guess meaning from name.
If I need to pass all of them, I was advised to define some structure with three pointers and pass pointer to structure.
static int answer_to_connection (void *cls,
struct MHD_Connection *connection,
const char *url,
const char *method,
const char *version,
const char *upload_data, size_t *upload_data_size, void **con_cls)
{
struct MHD_Response *response;
int ret;
url_arg = wget_buffer_alloc(1024);
MHD_get_connection_values (connection, MHD_GET_ARGUMENT_KIND, print_out_key, NULL);
url_it2 = url_it;
wget_buffer_t *url_full = wget_buffer_alloc(1024);
wget_buffer_strcpy(url_full, url);
if (url_arg)
wget_buffer_strcat(url_full, url_arg->data);
if (!strcmp(url_full->data, "/"))
wget_buffer_strcat(url_full, "index.html");
url_it = url_it2 = 0;
unsigned int itt, found = 0;
for (itt = 0; itt < nurls; itt++) {
char *dir = _scan_directory(url_full->data + 1);
if (dir != 0 && !strcmp(dir, "/"))
wget_buffer_strcat(url_full, "index.html");
char *host = _parse_hostname(url_full->data);
if (host != 0 && !strcmp(host, "/"))
wget_buffer_strcat(url_full, "index.html");
wget_buffer_t *iri_url = wget_buffer_alloc(1024);
wget_buffer_strcpy(iri_url, urls[itt].name);
MHD_http_unescape(iri_url->data);
if (urls[itt].code != NULL &&
!strcmp(urls[itt].code, "302 Redirect") &&
!strcmp(url_full->data, iri_url->data))
{
response = MHD_create_response_from_buffer(strlen("302 Redirect"),
(void *) "302 Redirect", MHD_RESPMEM_PERSISTENT);
for (int itt2 = 0; urls[itt].headers[itt2] != NULL; itt2++) {
const char *header = urls[itt].headers[itt2];
if (header) {
char *header_value = strchr(header, ':');
char *header_key = wget_strmemdup(header, header_value - header);
MHD_add_response_header(response, header_key, header_value + 2);
ret = MHD_queue_response(connection, MHD_HTTP_FOUND, response);
wget_xfree(header_key);
itt = nurls;
found = 1;
} else
itt = nurls;
}
} else if (!strcmp(url_full->data, iri_url->data)) {
response = MHD_create_response_from_buffer(strlen(urls[itt].body),
(void *) urls[itt].body, MHD_RESPMEM_PERSISTENT);
He asks me again to ensure that content of urls[itt].body
will be valid until
end of this connection. Otherwise I must use MHD_RESPMEM_MUST_COPY
as last
parameter. I followed his advice and fix my code later.
Two generic advises that very useful he gave to me:
- Avoid using global variables.
- Use better names for variables. It it hard to understand what mean
iri_url
,itt
anditt2
.
6th Week
I found some difficulties when deal with global variables. I throw a question to mentors. Based on explanation from Evgeny before, I was given a suggestion to avoid using global variables. Then, I modify the code to this:
// for passing URL query string
struct query_string {
wget_buffer_t
*params;
int
*it;
};
static int print_out_key(void *cls, enum MHD_ValueKind kind, const char *key,
const char *value)
{
struct query_string *query = cls;
if (key && query->it == 0) {
wget_buffer_strcpy(query->params->data, "?");
replace_space_with_plus(key);
wget_buffer_strcat(query->params->data, key);
if (value) {
wget_buffer_strcat(query->params->data, "=");
replace_space_with_plus(value);
wget_buffer_strcat(query->params->data, value);
}
}
if (key && query->it != 0) {
wget_buffer_strcat(query->params->data, "&");
replace_space_with_plus(key);
wget_buffer_strcat(query->params->data, key);
if (value) {
wget_buffer_strcat(query->params->data, "=");
replace_space_with_plus(value);
wget_buffer_strcat(query->params->data, value);
}
}
query->it++;
return MHD_YES;
}
static int answer_to_connection(void *cls,
struct MHD_Connection *connection,
const char *url,
const char *method,
const char *version,
const char *upload_data, size_t *upload_data_size, void **con_cls)
{
struct query_string *query;
// get query string
query->params = wget_buffer_alloc(1024);
query->it = 0;
MHD_get_connection_values(connection, MHD_GET_ARGUMENT_KIND, print_out_key, query);
...
wget_buffer_t *url_full = wget_buffer_alloc(1024);
wget_buffer_strcpy(url_full, url);
if (query->params->data)
wget_buffer_strcat(url_full, query->params->data);
wget_buffer_free(&query->params);
}
I need to pass struct query into MHD_get_connection_values
. But, I never get
expected result, often ends with segmentation faults. Then, I changed variable
initialization to this:
struct query_string *query = {wget_buffer_alloc(1024), 0};
Still, the result is not what I expected.
Tim help me with his answers. I asked to switch on warnings (e.g. touch
.manywarnings
and do ./configure
again). Based on it, I will see something
like:
libtest.c: In function 'answer_to_connection':
libtest.c:357:13: warning: 'query' is used uninitialized in this function [-
Wuninitialized]
query->str = wget_buffer_alloc(1024);
~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
I should take such warnings for serious analysis and fix them, e.g. by removing the pointer here:
struct query_str query;
...
query.str = wget_buffer_alloc(1024);
query.it1 = 0;
I just didn’t allocate my previous *query
. I should use a stack allocation as
above. And that is what all I need. Now, I can pass the variable through the
functions. No more global variable needed.
Actually I already enable the manywarnings option, nonetheless I can
interpreting the messages. Now, I learned something.
7th Week
Questions about CI testing
Previously I said that some CI runner still fail such as Gitlab MingW64 and Travis OSX where other test passed.
Gitlab MingW64: I have no idea about how to install Libmicrohttpd on
MingW64 with Debian images. Previously, I can install Libmicrohttpd on Fedora
images, but currently Wget2 use Debian. I ask if someone can give me a hint
about how to install packages on Debian MingW64.
Tim said he doubt there is a MinGW64 Libmicrohttpd package for Fedora (nor for
Debian). But, I always could build my own library, e.g. via git clone
Libmicrohttpd, cd into the directory, build with MinGW64 (and optionally make
install
). Then set the right library paths for Wget2’s ./configure
run as
always.
This might be tedious since there could be some pitfalls not easy to debug. His
suggestion would be to leave this step out until I am ready with all the other
stuff. But when I try, I should do it locally first and note the steps I did.
On Travis Debian test, when I just use apt install libmicrohttpd-dev
in
travis script, it installed Libmicrohttpd <= 0.9.51 which causes problems. So,
I do some workaround with install Libmicrohttpd 0.9.55 from source. Here is the
output of git diff from gitlab-ci and travis.
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index e8d1946..8e77d8e 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -34,6 +34,8 @@ variables:
# * ASan, UBSan, Msan
# * distcheck
Debian GNU/Linux build:
+ before_script:
+ - apt-get update -qq && apt-get install -y -qq libmicrohttpd-dev
image: $CI_REGISTRY/$BUILD_IMAGES_PROJECT:$DEBIAN_BUILD
script:
- ./bootstrap && touch .manywarnings
@@ -55,6 +57,8 @@ Debian GNU/Linux build:
- tests/*.log
clang/Fedora:
+ before_script:
+ - dnf -y install libmicrohttpd-devel
image: $CI_REGISTRY/$BUILD_IMAGES_PROJECT:$FEDORA_BUILD
script:
- export CC=clang
diff --git a/.travis_setup.sh b/.travis_setup.sh
index 916dbdb..6e5041b 100755
--- a/.travis_setup.sh
+++ b/.travis_setup.sh
@@ -14,7 +14,13 @@ if [[ "$TRAVIS_OS_NAME" = "osx" ]]; then
brew install xz
brew install lbzip2
brew install lzip
+ brew install libgcrypt
+ brew install libmicrohttpd
brew link --force gettext
elif [[ "$TRAVIS_OS_NAME" = "linux" ]]; then
+ sudo apt-get -y install wget
+ wget http://ftp.gnu.org/gnu/libmicrohttpd/libmicrohttpd-0.9.55.tar.gz
+ tar zxf libmicrohttpd-0.9.55.tar.gz && cd libmicrohttpd-0.9.55/
+ ./configure --prefix=/usr && make -j$(nproc) && sudo make install
pip install --user cpp-coveralls
fi
I then ask that my snippet above is appropriate or there is any other better solutions, and Tim said it is fine to use it.
Travis OSX: Failure regarding test with HTTPS support. I have checked
that even I provide all dependencies for TLS/SSL support like gnutls and
libgcrypt, it still fail. I don’t know if it fails because how my code read
certificates provided or other reasons.
Tim comments is the problem occurs only on the OSX VM. This looks like
Libmicrohttpd is not starting correctly. The same versions (gnutls 3.5.14 and
Libmicrohttpd 0.9.55) work fine here on Debian unstable. I should try to get
more logging output.
Evgeny also give comments, more output from Libmicrohttpd could give more
information. What he noticed in the meantime is warning passing const char *
to parameter of type char * discards qualifiers
which means that could
modifying read-only memory. I should try to resolve it not by casting to char
*
.
Then I have a question, how could I get more verbose messages log of
Libmicrohttpd? I tried to use MHD_OPTION_EXTERNAL_LOGGER
like in example
provided in msgs_i18n.c
.
httpsdaemon = MHD_start_daemon(MHD_USE_SELECT_INTERNALLY | MHD_FEATURE_MESSAGES |
MHD_USE_TLS | MHD_USE_ERROR_LOG,
port_num, NULL, NULL, &_answer_to_connection, NULL,
MHD_OPTION_EXTERNAL_LOGGER, &error_handler, NULL,
MHD_OPTION_HTTPS_MEM_KEY, key_pem,
But, it doesn’t generated any useful messages. It Just tell me that the HTTPS
server failed in the runtime.
Evgeny replied, most probably I was trying to use Libmicrohttpd build without
HTTPS support. I should check original package. Alternatively, I could try
MHD_is_feature_supported(MHD_FEATURE_SSL)
and skip HTTPS tests if HTTPS is not
supported by Libmicrohttpd. I can also display warning about it during
configure. Before implementing such check, find first Libmicrohttpd version
where MHD_is_feature_supported()
was added.
I asked about which kind of problems I have with Libmicrohttpd 0.9.51. Evgeny
said most recent versions of Libmicrohttpd are stable, there are only a few
“problematic” versions. I replied with show the build log:
The following NEW packages will be installed:
libmicrohttpd-dev libmicrohttpd10
0 upgraded, 2 newly installed, 0 to remove and 25 not upgraded.
Need to get 190 kB of archives.
After this operation, 499 kB of additional disk space will be used.
0% [Working]
Get:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty/universe amd64 libmicrohttpd10
+amd64 0.9.33-1 [41.0 kB]
======
libtest.c: In function '_http_server_start':
libtest.c:547:3: error: unknown type name 'MHD_socket'
MHD_socket sock_fd;
^
It seems the version of Libmicrohttpd installed is 0.9.33-1. For quick workaround I installed v0.9.55 from source and it works. He also said, indeed v0.9.33 is too old. But it’s not too hard to handle it:
#include <microhttpd.h>
#if MHD_VERSION <= 0x93302
#define MHD_socket int
#endif
Nice pointer. Maybe I can apply this solution later.
Questions about many warnings on libtest.c and Libmicrohttpd
Here some warning I get when I build libtest.c
on my system. I mentioned my
system that I use Ubuntu 16.04 64 bit with configure command ./configure
--enable-manywarnings
libtest.c: In function ‘_load_file’:
libtest.c:166:11: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (size != fread(buffer, 1, size, fp)) {
^
libtest.c:227:29: warning: passing argument 1 of ‘_replace_space_with_plus’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
_replace_space_with_plus(value);
libtest.c:194:14: note: expected ‘char *’ but argument is of type ‘const char *’
static char *_replace_space_with_plus(char *data)
^
Once I tried to use char *
on _replace_space_with_plus()
so it becomes
_replace_space_with_plus((char *) key)
but it not allowed (API violation).
Tim suggests me to give query-params
to my “replace function” and iterate through
all chars and add these to the buffer. For example:
_replace_space_with_plus(query->params, key);
static void _replace_space_with_plus(wget_buffer_t *buf, const char *data)
{
for (; *data; data++)
wget_buffer_memcat(buf, *data == ' ' ? "+" : data, 1);
}
He said no need to do any optimization at this point.
However, Evgeny tells that using implicit conversion from const char*
to
char*
is the same API violation. But there is a better solution. If I not
allowed to modify original string (as this this is used for other proposes as
well) and I can read original string, just make my own copy of string and
modify it. And I should free my copy when I don’t need it anymore.
Here is another warning messages:
libtest.c: In function ‘_print_authorization’:
libtest.c:252:63: warning: unused parameter ‘kind’ [-Wunused-parameter]
static int _print_authorization(void *cls, enum MHD_ValueKind kind,
^
It used by Libmicrohttpd API but it treated as warning by the compiler.
libtest.c:272:64: warning: unused parameter ‘con_cls’ [-Wunused-parameter]
const char *upload_data, size_t *upload_data_size, void **con_cls)
Same as above, it treated as warning by the compiler, but actually it required.
Solution from Tim, if I found error regarding -Wunused-parameter
is by adding
the attribute G_GNUC_WGET_UNUSED
to the param.
Another warning messages:
libtest.c:429:13: warning: Value MHD_HTTP_REQUESTED_RANGE_NOT_SATISFIABLE is deprecated, use MHD_HTTP_RANGE_NOT_SATISFIABLE
ret = MHD_queue_response(connection, MHD_HTTP_REQUESTED_RANGE_NOT_SATISFIABLE, response);
^
It seems about Libmicrohttpd incompatibility. I will provide version checker
(ifdef MHD version) to fix this.
Tim directs me to the solution:
#ifdef MHD_HTTP_RANGE_NOT_SATISFIABLE
ret = MHD_queue_response(connection, MHD_HTTP_RANGE_NOT_SATISFIABLE,
response);
#else
ret = MHD_queue_response(connection,
MHD_HTTP_REQUESTED_RANGE_NOT_SATISFIABLE, response);
#endif
Very clear explanation from him.
Some warnings regarding unmatching data type:
libtest.c: In function ‘_answer_to_connection’:
libtest.c:365:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int it2 = 0; it2 < countof(urls[it1].headers); it2++) {
^
Tim said it just as simple as add unsigned
keyword in front of data type,
like below:
for (unsigned it2 = 0; it2 < countof(urls[it1].headers); it2++) {
Tim said, in general, I should stick with the Wget2 coding style for
consistency. For example, instead of if (' ' == *data)
use (*data == ' ')
.
He comments about function I write: _get_file_size()
. He suggests to use
stat()
. It’s just single system call. Other function, _load_file()
. He
recommends to use wget built in function, wget_read_file()
as replacement.
Actually I get those functions from Libmicrohttpd implementation example of
HTTPS. Tim suggestions made all of this shorter and simpler.
Problem with HTTP Persistent Connection
One of my report said, I need to solve problem of check-valgrind
. I figure out
that the problem come out from connection state of the response which need to
be “closed” after created. The solution is just to add HTTP header “Connection:
Close” in all response without exception.
Evgeny gives me some clarifications about this. This way I am actually hiding
the problem, not fixing. Both Libmicrohttpd and wget must work perfectly nice
with keep-alive. Moreover, in Libmicrohttpd test suite we run many tests in
two modes: with “keep-alive” and “close” connections to ensure testing of all
sides. He suggest me to fix modification of read-only memory before trying to
resolve valgrind as this modification could easily trigger valgrind errors. If
I run any test in multi-thread mode, he suggests to check thread
synchronisation and concurrent access to modifiable variable.
After learned that I should leave persistent connection to be exist between
client and server, I try to make a change, just by add HTTP header “Connection:
Close” on request that returned 404, 401 and 416 status code, just like in the
old http server code. Even more, I actually don’t have to close connection
after those response codes. Otherwise, I leave the connection as is
(keep-alive). But, I still found other issues from the result.
The test case fail if they found robots.txt in server, in other words,
robots.txt returned status 200 OK. Here the error log from running the test
with address sanitizer enabled:
HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Length: 85
Content-Type: text/plain
Date: Thu, 20 Jul 2017 19:17:43 GMT
21.021743.773 method 2
21.021743.773 nbytes 85 total 85/85
21.021743.773 keep_alive=1
21.021743.773 Scanning robots.txt ...
21.021743.773 host_remove_job: 0x60c00000b980
21.021743.773 host_remove_job: qsize=1 host->qsize=1
21.021743.774 [0] action=1 pending=0 host=0x60700000df40
21.021743.774 qsize=1 blocked=0
21.021743.774 pause=-1500578263774
21.021743.774 dequeue job http://localhost:41929/index.html
21.021743.774 main: wake up
=================================================================
==14544==ERROR: AddressSanitizer: heap-use-after-free on address 0x610000007de3 at pc
+0x7f16594162d5 bp 0x7f16504fec60 sp 0x7f16504fe408
READ of size 6 at 0x610000007de3 thread T1
#0 0x7f16594162d4 (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x472d4)
#1 0x7f1659070c91 in wget_strcmp /home/didik/wget2/libwget/utils.c:82
#2 0x41b6d4 in try_connection /home/didik/wget2/src/wget.c:1088
#3 0x41c081 in establish_connection /home/didik/wget2/src/wget.c:1151
#4 0x422f06 in downloader_thread /home/didik/wget2/src/wget.c:1643
#5 0x7f1657e176b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
#6 0x7f1657b4d3dc in clone (/lib/x86_64-linux-gnu/libc.so.6+0x1073dc)
0x610000007de3 is located 163 bytes inside of 180-byte region [0x610000007d40,0x610000007df4)
freed by thread T1 here:
#0 0x7f16594672ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
#1 0x7f165904840b in wget_iri_free /home/didik/wget2/libwget/iri.c:287
#2 0x40d5b0 in host_remove_job /home/didik/wget2/src/host.c:347
#3 0x423800 in downloader_thread /home/didik/wget2/src/wget.c:1715
#4 0x7f1657e176b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
previously allocated by thread T0 here:
#0 0x7f1659467602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
#1 0x7f16590784da in wget_malloc /home/didik/wget2/libwget/xalloc.c:85
#2 0x7f1659048648 in wget_iri_parse /home/didik/wget2/libwget/iri.c:350
#3 0x7f165904da08 in wget_iri_parse_base /home/didik/wget2/libwget/iri.c:818
#4 0x40c9cf in host_add_robotstxt_job /home/didik/wget2/src/host.c:288
#5 0x414bba in add_url_to_queue /home/didik/wget2/src/wget.c:438
#6 0x4192ae in main /home/didik/wget2/src/wget.c:853
#7 0x7f1657a6682f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
Thread T1 created by T0 here:
#0 0x7f1659405253 in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x36253)
#1 0x7f165906d933 in wget_thread_start /home/didik/wget2/libwget/thread.c:48
#2 0x41a3bf in main /home/didik/wget2/src/wget.c:963
#3 0x7f1657a6682f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
SUMMARY: AddressSanitizer: heap-use-after-free ??:0 ??
Shadow bytes around the buggy address:
0x0c207fff8f60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c207fff8f70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c207fff8f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c207fff8f90: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c207fff8fa0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x0c207fff8fb0: fd fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd fa
0x0c207fff8fc0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x0c207fff8fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 fa
0x0c207fff8fe0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c207fff8ff0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c207fff9000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
==14544==ABORTING
Unexpected error code 1, expected 0 [-r -nH]
Removed test directory '.test_14531'
FAIL test-robots (exit status: 1)
What I can see is the process want to use an allocated memory that has been
freed before. But, I still don’t know what exact variable/pointer that trigger
the problem.
Evgeny give comments about this. It should be pretty straightforward. I should
check what is allocated at host_add_robotstxt_job()->wget_iri_parse
iri.c:350
in main thread. Find out how it’s passed to downloader_thread()
.
Find out how it’s freed by host_remove_job()->wget_iri_free()
. Find out why
it’s still used in
downloader_thread()->establish_connection()->try_connection()
at
wget.c:1088
.
Tim give me a clue, this is definitely a bug in Wget2 itself. No matter what
the test server is doing, this should not happen. He will try to reproduce this
issue to investigate later.
Because I feel a little bit rush, I have create a merge request regarding this
[1]. I thought I was doing the right thing, until I realize, I made a mistake.
Instead of posting issue, I made a merge request, which actually I try to
modify code I quite not understand. Hence, I get some cautions about this. But,
because of this, now I know the right procedure how to handle problem in
community. Mentioning an issue itself is good way to contribute to community,
because valid issue will lead our community to find the solution early.
Questions about chunked tranfer encoding
In test case that involving chunked transfer encoding, I found that they just freeze without another reaction, holding up until I send interrupt signal.
HTTP/1.1 200 OK
Connection: Keep-Alive
Content-Length: 15
Transfer-Encoding: chunked
Content-Type: text/plain
Date: Thu, 20 Jul 2017 19:30:55 GMT
21.023055.800 method 1 0 0:
21.023055.800 a nbytes 15 body_len 15
21.023055.800 chunk size is 3
21.023055.800 write full chunk, 3 bytes
21.023055.801 chunk size is 2
21.023055.801 write full chunk, 2 bytes
I don’t know why in the old http server code, they just can close the connection
implicitly after generate the response.
Evgeny replied I should check with debugger first step-by-step chunked download
process.
I was looking in to the manual and examples that we should use
MHD_create_response_from_callback()
to handle chunked transfer encoding
connection. My question, how do we generate a data using
MHD_ContentReaderCallback
? In the examples there just an empty data
(chunked_example.c
) and infinite data (minimal_example_comet.c
)? Say we
have a string as the buffer, something like:
static ssize_t
data_generator (void *cls, uint64_t pos, char *buf, size_t max)
{
if (max < 80)
return 0;
strcpy(buf, "3\r\nan\r\n2\r\example\r\n");
return 80;
}
I also asks, how do I arrange it properly? So the result is like expected
chunked transfer encoding file. I have tried using that example and the
result is infinite streaming file. I try using return value
MHD_CONTENT_READER_END_OF_STREAM
and it give me nothing, just debug
information showing chunked transfer encoding.
Evgeny said, it was indeed not clear from example how to use chunked transfer.
He give a hand by updated chunked_example.c in official git repository. I’d
tried chunked_example.c
again and it was successfully run. But, when I modify
using encoded data like:
static const char response_data[] = "3\r\nan\r\n2\r\example\r\n";
The result was:
3
an
2
example
I was expected a result something like plain: “an example” as the output.
Evgeny explained, I don’t need to handle chunked encoding in application.
Libmicrohttpd automatically wrap supplied data into chunks with local header and
footer. If I want to generate more chunks - simply provide smaller amount of
data with each call of callback. Each part of data will be send as separate
chunk.
I’m actually stuck in this. I asked how do we present a connection with multiple
chunks combined into one complete data? Like what he said, I want to generate
more chunks. I ask if the code below related:
/* Pseudo code. *
if (data_not_ready)
{
// Callback will be called again on next loop.
// Consider suspending connection until data will be ready.
return 0;
}
* End of pseudo code. */
I wonder what should I do with “data_not_ready”? My goal is to simulate chunked transfer encoding between client and server using Wget2 and Libmicrohttpd respectively. Currently, in the Wget2 test suite, it uses encoded data like:
"3\r\nabc\r\n2\r\nde\r\n"
With Libmicrohttpd, I don’t need to provide encoded data like above. It
automatically handled by Libmicrohttpd. So I need to change data provided in
test-chunked.c
.
Evgeny said, I don’t need to change data provided by callback.
I need to change amount of data provided by each call of callback. For example:
if (buf_size < (param->response_size - pos))
size_to_copy = buf_size;
else
size_to_copy = param->response_size - pos;
if (2 < size_to_copy)
size_to_copy = 2;
memcpy (buf, param->response_data + pos, size_to_copy);
Or use more complex algorithm with variable size depending on position.
In this case chunks may have different size.
It’s now clear. I just need to divide data into two smaller chunks.
Another case is, there is a garbage data like FFFFFFF4\nGarbage
need to
produced. I need to simulate it with Libmicrohttpd.
Evgeny answered, depending on where I want to put this data. If I want to put it
in reply body - just use FFFFFFF4\nGarbage
as reply.
But, I still don’t get it. Where is the reply body?
He said reply body is in response data sent by Libmicrohttpd.
If I wrote it in response text, it just literally send me same text as before
processed. Instead, I want to create response with chunk size FFFFFFF4
, like I
wrote in the provided response text (before data encoded).
He then said, if I want to have chunk size FFFFFFF4
, I have to provide data
with size FFFFFFF4
. Libmicrohttpd do not generate invalid HTTP data. For any
test with invalid reply he suggests me to use custom simple hard coded
pseudo-server instead of Libmicrohttpd.
Another question is how can I passed a variable through callback.
wget_buffer_t *chunked_data = wget_buffer_alloc(1024);
wget_buffer_strcpy(chunked_data, "some char");
response = MHD_create_response_from_callback(MHD_SIZE_UNKNOWN,
1024,
&_callback,
chunked_data,
NULL);
My goal is to pass “chunked_data” as response_data for the callback. Evgeny updated chunked example once more. From his point of view, it is pretty simple and clear.
Another question, and maybe just it is my misunderstanding, about how the response text provided. It requires to use char array, while I just have char pointer.
char *body = "abcde";
static char simple_response_text[6];
strcpy(simple_response_text, body);
simple_response_text[sizeof(simple_response_text) - 1] = '\0';
Then I copy the char pointer into char array using strcpy
. Though I still need
to provide the length of the char. As a workaround, I just hard coded the char
length. I ask if there is any better solution.
Evgeny give some explanations, char size is always one byte (it could have
different size, but almost all real platforms use one byte). I should not
assign pointer to static string (which is const char *
) to non-const pointer.
strcpy()
copies termination null as well, no need to terminate it again.
Libmicrohttpd response do not need null-terminated strings, as HTTP could send
any data, not only strings. I can use array variable as pointer to first array
member.
Hence I need to read more about C basics, including static strings, arrays,
pointers, pointers manipulations, pointers arithmetics, strings manipulations
and memory management.
8th Week
I have work on wget_test_start_server()
to use Libmicrohttpd as HTTP server.
To resolve this, I fix answer_to_connection()
function to create proper HTTP
response to:
- Handle arguments (query string) on URL
- Handle redirection
- Handle directory creation
- Handle URL with IRI object
- Handle URL with IDN hostname
- Handle URL with query string which contains space
- Handle If-Modified-Since header
- Handle Byte Range request
- Handle HTTP basic authentication
- Handle HTTPS
Pass CI test on some system like Debian/GCC and Fedora/Clang.
Fix compiler warning -Wunused-param
caused by Libmicrohttpd API.
Use persistent connection (Connection: keep-alive) instead multiple.
connection (Connection: close). This prevent hiding several problems of the
connection between HTTP server and client.
At this point I think that the new HTTP server with Libmicrohttpd for test suite
has already covered all existing test. Later, I can add non existing test that
ready on Wget2 and Libmicrohttpd. Such as add HTTP digest authentication test.
Another unresolved tasks are:
- Fix connection handler using chunked tranfer encoding
- Fix CI testing on Gitlab MingW64 and Travis OSX
Second Period Evaluations
Just like previous time, we as student are required to submit evaluations. The
evaluations itself was just like the one before, it asks us to fill feedback
form to the organization.
The results then announced. Mentors give me chance to pass this period
evaluations. They send me messages that my communication has definitely
improved over time. They think this project is now again moving on the right
track towards completion.
Conclusion
In this period I faced again with many challenges that test my technical skills. My communication abilities are also tested a lot. I must be good in conveying problems to my mentor so there is no miscommunication. Mentors also helped me a lot in solving the problems I have encountered. They give me a trust to finish this project. I should be able to take advantage of this chance to give the best of me in contributing to this project.
Reference(s):
[0] https://stackoverflow.com/questions/1005676/urls-and-plus-signs
[1] https://gitlab.com/gnuwget/wget2/merge_requests/250