Viruses and Anti-viruses


Viruses

Viruses are programs that copy themselves from one file or computer to another without the knowledge of the person using the computer. The two broad categories of viruses are boot-sector viruses and file viruses.

Boot-Sector Viruses

When a computer is rebooted from a floppy disk, the contents of the boot sector of that floppy disk is copied into memory. Then the CPU runs executes a jump to an offset within that memory area, which contains the program code to load the operating system into memory. This code can be changed to do nasty things to the computer. Even in non-system disks, there is a little program on the disk which displays a message saying that the disk is not bootable. Thus any disk is vulnerable, whether the disk is bootable or not.

When a computer is rebooted from the hard drive, the partition table/master boot record is loaded into memory. A similar scenario is present as viral code can also reside here.

In both cases, the first thing that a virus wants to do is propagate itself, so it searches for other drives on the system and infects any disk it may find there.

Some viruses contain a PAYLOAD. This is some action that the virus performs at a specified time. The time is chosen to be sufficiently long so as to allow the virus to propagate far and wide before the payload is triggered. The payload is either harmless (eg. a message on the screen) or destructive (erasing the hard drive).

Boot sector viruses have certain restrictions:

Examples of boot-sector viruses are Stoned and Exebug (Weimar).

Executable Files

There are two types of executable files: .EXE and .COM files.

.COM files contain a simple sequence of machine code instructions. COM files cannot be larger than 64k in length. To load a COM file, DOS copies all the bytes directly into memory and then sets the CS, DS and SS to point to the segment in which the file was loaded. The SP is set to the end of the segment - the stack grows upwards from the bottom of the same segment. If the stack gets too big, it may overwrite the program ! IP is set to 100h, which is the first address after the PSP (program segment prefix).

.EXE files contain further information about segments and relocation in a special header at the top of every EXE file. The CS is set as for COM files but DS is undefined. IP is set to the top of the program code. SS and SP are set to a stack as defined in the EXE header. Relocatable addresses are resolved.

To write an assembly language program to produce a COM file, there must be only one segment (CODE) and all the data must be defined here. There must be no STACK directive and the MODEL must be TINY (meaning that everything is located in one segment). Thereafter, the program must be linked into a .COM file using the relevant parameters of the linker to do this (normally /t).

File Viruses

File viruses attach themselves to the end of executable files. They then patch the first few bytes in the machine code to make a far jump to the beginning of the virus. The viral code then executes. After propagating itself or loading itself into memory, the virus restores the bytes at the top of the program and jumps back there, making it seem like everything is normal.

File viruses have to have separate procedures to infect COM files and EXE files since the format of these two types of files is different.

Examples of file viruses are Matura92 and Saturday14.

Resident Viruses

Most viruses go memory resident once they are run. They then link themselves to the disk-access interrupts.

Boot-sector virus: Normally hooked into interrupt 13h. Whenever a floppy disk is accessed for any reason whatsoever, the new ISR, being part of the virus, first infects the boot-sector before reading from or writing to the disk.

File-virus: Normally hooked into interrupt 21h. Whenever a file is accessed, the new ISR, being part of the virus, first infects the file before reading from or writing to it.

Since infected programs are not necessarily memory resident themselves, the virus has to go to extreme lengths to make itself resident in spite of its host. Options are to allocate memory and keep the virus there but this doesn't work very well since the virus is easy to spot. Lots of viruses simply copy themselves into the highest 1 kilobyte of memory and assume that no program is really going to use this memory. Its not critical since the virus can probably infect a whole lot of files before it gets erased by another program.

How to write a file virus

The first problem is that a virus cannot exist without its host. Any virus is always attached to another program. So it may be necessary to write a dummy assembler program around the virus to make it executable on its own.

If the virus is going to be memory-resident, it needs to have a routine to make its code resident in memory. This is probably going to be the first part of the code. Then the virus must be linked to the relevant interrupt and control can be passed back to the host program by jumping to the top of the segment.

The infection routine can either be called upon execution or by the ISR for disk activity. This procedure must search for an executable or use the one accessed, check if it is infected and then infect it. Infection must first save the topmost bytes of the file somewhere within the virus body. Then the top bytes must be changed to jump to the very end of the file. Lastly, the virus must be appended to the file.

Upon execution of the virus, it can decide whether or not to run its payload.

The virus has to be very compact and fast so that its activity is not noticable on the computer. Viruses do not normally return disk errors since this is a dead giveaway if you are trying to read from a write-protected disk. Stealth techniques sometimes mask the presence of the virus by eliminating its size from executable files using another ISR.

More advanced viruses include code to change the size, order and characteristics of the virus so that detection is difficult.

How to write an anti-virus

In order to write an anti-virus it is essential to send the infected file through a debugger or back-assembler to produce assembler code. This can then be read in an attempt to understand the workings of the virus. Virus code tends to be a few hundred lines long and can be quite cryptic since the programmers use every trick possible to speed up the virus and make it smaller.

The first step is to isolate the virus by figuring out where it starts and how long it is. The easy way to do this is to write a small assembly language (or other language) program. Note its size and infect it. The difference in size gives a rough idea of the size of the virus.

Most viruses will store the original bytes from the beginning of the file somewhere within their code. If these bytes can be located, then they can be restored to the top of the file.

EXE files need special information in the header about the size of the file and segment addresses. This is sometimes stored in the virus but at other times, it has to be recalculated using information about the EXE file header.

The virus can be removed by simply truncating the file at the beginning of the virus.

Conclusion

Self-replicating code is one of the more advanced uses or abuses of assembly language. Even entire operating systems can be written in high-level languages but virus-writers need to have an intimate knowledge of assembler.