Binary Data Hiding in VB6 Executables

Overview

This is part one in a series of posts that focus on understanding Visual Basic 6.0 (VB6) code, and the tactics and techniques both malware authors and researchers use around it. 

Abstract 

This document is a running tally covering many of the various ways VB6 malware can embed binary data within an executable. 

There are 4 main categories: 

  • string based encodings 
  • data hidden within the actual opcodes of the program 
  • data hidden within parts of the VB6 file format  
  • data in or around normal PE structures 

Originally I was only going to cover data hidden within the file format itself but for the sake of  documentation I decided it is worth covering them all.  

Data held within the file format is a special case which I find the most interesting. This is because it can be interspersed within a complex set of undocumented structures which would require  advanced knowledge and intricate parsing to detect. In this scenario it would be hard to determine where the data is coming from or to even recognize that these buffers exist.  

Resource Data 

The first technique is the standard built into the language itself, namely loading data from the  resource section. VB6 comes with an add-in that allows users to add a .RES file to the project.  This file gets compiled into the resource section of the executable and allows for binary data to be  easily loaded. 

This is a well known and standard technique. 

Appended Data 

This technique is very old and has been used from all manner of programming language. It will be mentioned again for thoroughness and to link to a public implementation [1] that allows for  simplified use. 

Hex String Buffers 

It is very common for malware to build up a string of hex characters that are later converted back to binary data. Conversion commonly includes various text manipulations such as decryption or  stripping junk character sequences. Extra character sequences are commonly used to prevent  automatic recognition of the data as a hex string by AV.  

In the context of VB6, there are several limitations. The IDE only allows for a total of 1023  characters to be on a single line. VB’s line continuation syntax of &_ is also limited to only 25  lines. For these reasons you will often see large blocks of data embedded in the following format: 

In a compiled binary each string fragment is held as an individual chunk which is easily  identifiable. A faster variant may hold each element in a string array so conglomeration only  occurs once.  

This is a well known and standard technique. It is commonly found in VBA, VB6 and malware  written in many other languages. Line length limitations can not be bypassed through command  line compilation. 

Binary Data Within Images 

There are multiple ways to embed lossless data into image formats. The most common will be to  embed the data directly within the structure of a BITMAP image. Bitmaps can be held directly  within VB6 Image and Picture controls. Data embedded in this manner will be held in the .FRX  form resource file before compilation. Once compiled it will be held in a binary property field for  the target form element. Images created like this can be generated with a special tool, and then  embedded directly into the form using the IDE. 

The following is a public sample[2] of data being extracted from such a bitmap 

Extracted images will display as a series of colored blocks and pixels of various colors. Note that  this is not stenography. 

Many tools understand how to extract embedded images from binary files. Since the image data  still contains the BITMAP header, parsing of the VB6 file format itself is not necessary. This  technique is public and in common use. The data is often decrypted after it is extracted. 

Chr Strings 

Similar to obfuscations found in C malware, strings can be built up at runtime based on individual  byte values. A common example may look like the following: 

At the asm level, this serves to break up each byte value and puts it inline with a bunch of  opcodes preventing automatic detection or display with strings. For native VB6 code it will look  like the following: 

In P-Code it will look like the following: 

This is a well known and standard technique. It is commonly found in VBA as well as VB6  malware. 

Numeric Arrays 

Numeric arrays are a fairly standard technique in malware that are used to break up the binary  data amongst the programs opcodes. This is similar to the Chr technique but can hold data in a  more compact format. The most common data types used for this technique are 4 byte longs, and 8 byte currency types. The main advantage of this technique is that the data can be easily  manipulated with math to decrypt it on the fly. 

Native: 

P-Code: 

Native: 

P-Code: 

This technique is not as popular as the others, but does have a long history of use. I think the first place I saw it was in Flash ActionScript exploits. 

Form Properties 

Forms and embedded GUI elements can contain compiled in data as part of their properties. The  most common attributes used are Form.Caption, Textbox.Text, and any element’s. Tag property. 

Since all of these properties are typically entered via the IDE, they are usually found to contain  ASCII only data that is later decoded to binary. 

Developers can however embed binary data directly into these properties using several  techniques.  

While there is way to hexedit raw data in the .FRX form resource file, this comes with limitations  such as not being able to handle embedded nulls. Another solution is inserting the data post  compilation. With this technique a large buffer is reserved consisting of ASCII text that has start  and end markers. An embedding tool can then be run on the compiled executable to fill in the  buffer with true binary data.  

Using form element properties to house text based data is a common practice and has been seen  in VBA, VB6, and even PDF scripts. Binary data embedded with a post processing step has been observed in the wild. In both P-Code and Native, access to these properties will be through COM object VTable calls.  

From the Semi-VBDecompiler source, each different control type (including ActiveX) has its own  parser for these compiled in property fields. Results will vary based on tool used if they can display the data. Semi-Vbdecompiler has an option to dump property blobs to disk for manual exploration. This may be required to reveal this type of embedded binary data. 

UserControl Properties 

A special case for the above technique occurs with the built in UserControl type. This control is  used for hosting reusable visual elements and in OCX creation. The control has two events which  are passed a PropertyBag object of its internal binary settings. This binary data can be easily set  in the IDE through property pages. This mechanism can be used to store any kind of binary data  including entire file systems. A public example of this technique is available[3]. Embedded data will be held per instance of the UserControl in its properties on the host form. 

Binary Strings 

Compiled VB6 executables store internal strings with a length prefix. Similar to the form properties trick, these entries can be modified post compilation to contain arbitrary binary data. In order to discern these data blobs from other binary data, in depth understanding and complex  parsing of the VB6 file format would have to occur.  

The longest string that can be embedded with this technique is limited by the line length in the  IDE which is 2042 bytes ((1023 bytes – 2 for quotes) *2 for unicode).

VB6 malware can access these strings normally with no special loading procedure. As far as its  concerned the source was simply str = “binary data”

The IDE can handle a number of unicode characters which can be embedded in the source for compilation. Full binary data can be embedded using a post processing technique. 

Error Line Numbers 

VB6 allows for developers to embed line numbers that can be accessed in the event of an error to  help determine its location. This error line number information is stored in a separate table outside of the byte code stream.  

The error line number can be accessed through the Erl() function. VB6 is limited to 0xFFFF line  numbers per function, and line number values must be in the 0-0xFFFF range. Since the size of  the embedded data is limited with this technique, short strings such as passwords and web  addresses are the most likely use.

When the code below is run, it will output the message “secret” 

Advanced knowledge of the VB6 file format would be required in order to discern this data from  other parts of the file. Embedded data is sequential and readable if not encoded in some other  way. 

Function Bodies 

The AddressOf operator allows VB6 easy runtime access to the address of a public function in  a module. It is possible to include a dummy function that is filled with just placeholder instructions to create a blank buffer within the .text section of the executable. This buffer can be easily loaded  into a byte array with a CopyMemory call. A simple post compilation embedding could be used to  fill in the arbitrary data.

For P-Code compiles, AddressOf returns the offset of a loader stub with a structure offset. P-Code compiles would require several extra steps but would still be possible.  

References 

[1] Embedded files appended to executable – theTrik:
https://github.com/thetrik/CEmbeddedFiles 

[2] Embedding binary data in Bitmap images – theTrik: 
http://www.vbforums.com/showthread.php?885395-RESOLVED-Store-binary-data-in UserControl&p=5466661&viewfull=1#post5466661 

[3] UserControl binary data embedding – theTrik:
https://github.com/thetrik/ctlBinData

Share:XFacebook

Menu