sub_401000 was the only subroutine called by main. It’s basically an internet connection checker. It calls InternetGetConnectedState and then compare its return value to 0 in an a “if” code construct. if it’s 0, there’s an internet connection, else the host is offline. The source code probably looks like this:
int conn_check()
{
BOOL ConnectedState;
ConnectedState = InternetGetConnectedState(0, 0);
if (ConnectedState)
{
sub_40105F('Success: Internet Connection');
return 1;
}
else
{
sub_40105F('Error 1.1: No Internet');
return 0;
}
}
When examining the disassembly of sub_401282 and its wrapper sub_40105F, several characteristics immediately identify the routine as an internal implementation of a printf family function. The most prominent sign is the presence of a format-string parsing loop. The code repeatedly loads a byte from a format string, increments the pointer, and branches on special characters. One of the input params is a struct, FILE. Looking it up on memory, we find its fields hard-coded:
<0, 0, 0, 2, 1, 0, 0, 0>
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | └─ _tmpfname = NULL
| | | | | | └──── _bufsiz = 0
| | | | | └─────── _charbuf = 0
| | | | └────────── _file = 1 <- FILE HANDLE (1 = stdout)
| | | └───────────── _flag = 2 <- _IOWRT (write mode)
| | └──────────────── _base = NULL
| └─────────────────── _cnt = 0
└────────────────────── _ptr = NULL
The second parameter is a pointer to a format string. At the very start of sub_401282, the code loads the first character of this string into bl and immediately tests it for null:
mov esi, [ebp+arg_4] ; esi = format string
mov bl, [esi] ; load first char
inc esi ; advance pointer
test bl, bl ; check if null
jz loc_4019F8 ; if null, jump to function exit
mov [ebp+arg_4], esi ; save updated pointer
This is a classic early-exit optimization: if the format string is empty, the function can return immediately without performing any further processing. It avoids unnecessary work and prevents the function from entering the main parsing and formatting logic.
Inside sub_401282, a jump table interprets format specifiers:
cmp bl, 20h ; check for space
jl loc_4012E5
cmp bl, 78h ; 'x'
jg loc_4012E5
movsx eax, bl
mov al, byte ptr ds:GetStringTypeW[eax]
and eax, 0Fh
This is a clear sign of format-string dispatching, where each character (like %d, %s, %x) maps to a case in the jump table. Subsequent blocks handle width, precision, flags, length modifiers, and buffer allocation.
Additionally, temporary buffers are allocated using constants like 0x200 and 0x800, which match known CRT patterns for intermediate storage of converted numeric or wide-character data.
The program simply checks for internet connection, and prints out the resulting string on the terminal.
It’s an internet connection check like the Lab 6-1.
To identify the purpose of the subroutine at address 0x40117F, we begin by examining its cross-references. In every case where sub_40117F is called, a string is pushed onto the stack immediately before the call. Many of these strings contain format specifiers, such as %c and \n, which strongly suggests that this function is used for formatted output.
This observation is reinforced by the usage in main. In the snippet below, a value previously parsed (stored in ecx) is pushed onto the stack, followed by a format string:
movsx ecx, [ebp+var_8]
push ecx
push offset aSuccessParsedC ; "Success: Parsed command is %c\n"
call sub_40117F
This calling pattern exactly matches that of the standard C library function printf, where arguments are pushed in reverse order: first the value to be formatted, then the format string.
The second subroutine called by main (located at 0x401040) is responsible for retrieving and parsing a command from a remote web resource. It begins by establishing an Internet connection using the Windows WinINet API function InternetOpenA, specifying the user-agent string “Internet Explorer 7.5/pma”. This makes the network activity appear similar to that of a legitimate web browser.
After successfully opening an Internet session, the function attempts to load a remote file hosted at: http://www.practicalmalwareanalysis.com/cc.htm
This is accomplished using the InternetOpenUrlA API. If the URL cannot be opened, the function prints an error message and terminates early.
Once the URL is successfully opened, the function reads 512 bytes from the remote resource using InternetReadFile. The data is stored entirely in a local stack-based buffer named Buffer. If the read operation fails, an appropriate error message is printed, the Internet handles are closed, and the function returns failure.
After the data is read into memory, the function inspects the first four bytes of the buffer. These bytes are checked sequentially to determine whether they match the ASCII sequence: <!—
This sequence represents the beginning of an HTML comment. The checks are implemented as a series of nested conditional comparisons, effectively equivalent to the following C-style logic:
if (Buffer[0] == '<') {
if (Buffer[1] == '!') {
if (Buffer[2] == '-') {
if (Buffer[3] == '-') {
return Buffer[4];
} else error;
} else error;
} else error;
} else error;
If all four comparisons succeed, the function extracts and returns the fifth byte of the buffer (Buffer[4]). This byte acts as a command character, presumably interpreted elsewhere in the program.
If any of the comparisons fail, meaning the downloaded content does not begin with an HTML comment, the function prints the error message “Error 2.3: Fail to get command” and returns 0.
This program exhibits clear network-based indicators that can be monitored. Specifically, it performs outbound HTTP requests using the user-agent string “Internet Explorer 7.5/pma” and connects to the URL: http://www.practicalmalwareanalysis.com/cc.htm
Monitoring network traffic for this uncommon user-agent string or repeated access to this domain would be effective for detection.
The purpose of this malware is to retrieve a remote command from a web server. It downloads a web page, checks whether the content begins with an HTML comment, and extracts a single-byte command embedded within that comment. If the expected format is not found, the program reports an error and exits cleanly.
When successful, the extracted command character is printed using a formatted output string. The program then sleeps for one minute before terminating. This behavior demonstrates a simple command-and-control (C2) mechanism, where commands are discreetly hidden in seemingly benign web content, making the network traffic less obvious and harder to detect by basic firewall rules.
Compared to Lab 6-2, this version of main introduces a new function call to sub_401130 after successfully retrieving and parsing the command from cc.htm. In Lab 6-2, the command was only printed; in this lab, it is actively processed. This new function is responsible for executing different actions based on the parsed command value.
The function sub_401130 takes two parameters:
The function contains a switch-case construct, implemented via a jump table. The command character is normalized by subtracting 0x61 (‘a’) and then used as an index into the jump table. This allows the malware to efficiently dispatch execution to one of several distinct behaviors.
Based on the command received the function can perform the following actions:
This malware exhibits several host-based indicators, including:
The purpose of this malware is to act as a simple command-and-control backdoor. It retrieves a remotely hosted command hidden inside an HTML comment, parses that command, and conditionally executes filesystem or registry-based actions on the infected host.
In Lab 6-4, main introduces a loop that repeatedly calls the network retrieval function (sub_401040), whereas in Lab 6-3 the function was called only once. Specifically, Lab 6-4 uses a loop controlled by var_C that runs 1,440 times, passing the loop counter as an argument to sub_401040 on each iteration.
Additionally, unlike Labs 6-2 and 6-3 where a static User-Agent string was used, Lab 6-4 dynamically modifies the User-Agent for each request using the format string:
Internet Explorer 7.50/pma%d
where %d corresponds to the current loop iteration. This rotating User-Agent strings (pma0, pma1, pma2…) aren’t just for variety, they make each request unique, defeating simple signature-based blocking.
A loop construct has been added to main. This loop controls repeated execution of the malware’s command-fetching, command-parsing, and command-execution logic. The loop terminates either after 1,440 iterations or earlier if an error occurs.
The HTML parsing logic itself remains largely the same: it still checks for an HTML comment beginning with <!— and extracts a single command byte from the response. However, in this lab the parsing function is now invoked repeatedly and operates in conjunction with a dynamic User-Agent string, making each request appear slightly different. This change increases stealth and reduces the likelihood of detection by simple signature-based network defenses.
The program runs for approximately 24 hours.
Yes. New network-based indicators include:
Internet Explorer 7.50/pma0
Internet Explorer 7.50/pma1
...
Internet Explorer 7.50/pma1439
The purpose of this malware is to function as a persistent command-and-control (C2) agent. It periodically contacts a remote server over the course of 24 hours, retrieves a hidden command embedded within an HTML comment, and executes that command locally.