| |
The "Go" tools
The GoAsm manual
GoAsm Assembler and Tools forum (in the MASM forum)
Writing 64-bit programs
by Jeremy Gordon -
This file is intended for those interested in writing
64-bit programs for the AMD64 and EM64T processors running on x64 (64-bit Windows),
using GoAsm (assembler), GoRC (resource compiler) and GoLink (linker).
It may also be of interest to those writing 64-bit assembler programs for
Windows using other tools.
|
|
Contents
- Introduction to 64-bit programming:
- How easy is 64-bit programming?
- Differences between 32-bit and 64-bit executables
- Differences between Win32 and Win64 (for AMD64/EM64T)
- Differences between x86 and x64 processors:
registers
instructions
RIP-relative addressing
call address sizes
- 64-bit programming in practice
- Changes to Windows data types
- Alignment requirements
- Windows structures in 64-bit programming
- Choice of register
- Zero-extension of results into 64-bit registers
- Sign-extension of results into qwords
- Automatic stack alignment
- Using the same source code for both 32 and 64-bits
- Converting existing 32-bit code to 64-bit
- Using AdaptAsm.exe to help with the conversion
- Some pitfalls to avoid when converting existing source code
- Switching using /x64 and /x86 in conditional assembly
- Assembling and linking to produce the executable
- Some code optimisation and refinement done by GoAsm
- Some tips to reduce the size of your code
- Demonstration files
-
Hello64World1 (simple 64-bit console program)
-
Hello64World2 (simple 64-bit windows program)
-
Hello64World3 (switchable 32-bit or 64-bit windows program)
- More information, references and links
Introduction to 64-bit programming
How easy is 64-bit programming? top
Despite the differences between the 64-bit processors and their 32-bit counterparts, and between
the x64 (Win64) operating system and Win32, using GoAsm to write 64-bit Windows programs
is just as easy as it was in Win32.
In fact, you can readily use the same source code to create executables for both platforms if
you follow a set of rules.
You can also convert existing 32-bit source code to 64-bits
and some of the work required to do this can be done automatically using
AdaptAsm.
Differences between 32-bit and 64-bit executables top
Although 32-bit and 64-bit executables are based on the same PE (Portable Executable) format,
in fact there are a number of major differences. The extent of those differences means that
32-bit code will only run on Win64 using the Windows on Windows (WOW64) subsystem. This works
by intercepting API calls from the executable and converting the parameters to suit Win64.
64-bit code will not work at all on 32-bit platforms.
The executable contains a flag which tells the system at load-time whether it is 32-bit or
64-bit. If the x64 loader sees a 32-bit executable, WOW64 kicks-in automatically. This means
that 32-bit and 64-bit code cannot be mixed within the same executable.
The significance of the above is that the programmer has to choose between:-
- Making one version of the application (Win32). This will work on both platforms.
- Making two versions of the application (one for Win32 and one for Win64).
For those who are interested in PE file internals, here is a summary of the main differences
between 32-bit and 64-bit executables:-
- The PE file format for Win64 files is called "PE+".
- The size of optional header field in the COFF header is 0F0h in a PE+ file and 0E0h in a
PE file.
- The "machine type" in the COFF header is not 14Ch (as it is for x86 processors), but is
8664h (for the AMD64 processor).
- The "magic number" at the beginning of the optional header is 20Bh instead of 10Bh.
- The "majorsubsystemversion" in a PE+ file is 5 instead of 4 in a PE file.
- The executable "image" (the code/data as loaded in memory) of a Win64
file is limited in size to 4GB. This is because the AMD64/EM64T processors use relative addressing
for most instructions, and the relative address is kept in a dword. A dword is only capable of
holding a relative value of ±2GB.
- The import address table (where the loader overwrites the addresses of external calls such
as the addresses of APIs in system Dlls) is enlarged to 64-bits, as is the import look-up table.
This is because the address of external calls could be anywhere in memory.
- The preferred image base, SizeofStackReserve, SizeofStackCommit, SizeofHeapReserve
and SizeofHeapCommit fields in the optional header are enlarged from 4 to 8 bytes.
- The default base address in Win64 is 400000h as in Win32 files.
- In practice you probably can't specify a 64-bit image base because the Microsoft tools do
not differentiate between relocation type 1 (64-bit absolute relocation) and relocation type
2 (32-bit relocation), at least in their early versions anyway.
- 64-bit executables which provide properly for full Win64 exception handling contain a .pdata
section holding the tables required for this.
You can view the internals of the PE file using
Wayne J. Radburn's PEview.
Differences between Win32 and Win64 (for AMD64/EM64T) top
Here are the main differences between Win32 and Win64 of relevance to the assembler or
Windows programmer:-
- Calling convention. Win32 uses the STDCALL convention whereas Win64 uses the FASTCALL
convention. In STDCALL all parameters which are sent to an API are PUSHed on the stack.
In Win32 the stack pointer (ESP) is reduced by 4 bytes for each PUSH. In STDCALL it is the
responsibility of the API to restore the stack to equilibrium.
In FASTCALL, the first four parameters are sent to the API in registers (in this order: RCX,RDX,R8 and
R9), but the fifth and subsequent parameters are PUSHed on the stack.
In Win64, the stack pointer (RSP) is reduced by 8 bytes for each PUSH. Unlike STDCALL, it is not
the responsibility of the API to clear up the stack. Instead this must be done by the caller
to the API. The caller must also ensure that there is space on the stack for the API to store
the parameters which are passed in registers. In practice this is achieved by reducing the stack
pointer by 32 bytes just before the call. Note than in GoAsm all the work required by the FASTCALL calling convention is done
automatically if you use INVOKE or ARG followed by INVOKE. See
coding to comply with FASTCALL calling convention.
The use of ARG and INVOKE is described in the relevant part of the
GoAsm manual.
Note that GoAsm does not yet do this for parameters which need
to be sent in the XMM registers (ie. in floating point instructions).
- Windows uses the FASTCALL convention to call the window procedures and other callback
procedures in your application. This means that your window procedures will pick up the parameters
in a different way under Win64. Also the window procedures no longer have to
restore the stack to equilibrium.
Note that GoAsm will implement these things automatically if you use FRAME...ENDF.
The use of FRAME...ENDF is described in the relevant part of the
GoAsm manual.
- All functions using a stack frame (including window procedures) need to follow certain
rules if they wish to make use of exception handling. The tools need also to add exception
frame records to the executable. This will also be handled automatically by the "Go" tools.
Note this is not yet available
- Register volatility. In Win32, window procedures and other callback procedures have to restore the values in the
EBP,EBX,EDI and ESI registers before returning to the caller (if the value in those registers
are changed). This is something that is also done by the Windows APIs (these registers
will not change when you call an API). These are called the "non-volatile" registers.
In Win64, this list of registers is extended to RBP,RBX,RDI,RSI,R12 to R15 and
XMM6 to XMM15.
The "volatile" registers are those which may be changed by APIs, and which you do
not need to save and restore in your window procedures and other callback procedures.
In Win32 the general purpose volatile registers were EAX,ECX and EDX. These have now
been extended to RAX,RCX,RDX, and R8 to R11.
- You might not have expected this, but in 64-bit assembly for the AMD64, pointers to
code and data whose addresses are within the executable are still only 32-bits.
This ties in with the fact that RIP-relative addressing limits the size of the
executable to 4GB. Pointers to external addresses, such as functions in Dlls, are 64-bit
wide so that the function can be anywhere in memory see call address sizes.
- In Win64 the data size of all handles and pointers are now 64-bits instead
of 32-bits. See Changes to Windows data types for more.
- In Win64 there are stricter requirements for the alignment of the stack,
data, and for structures (see alignment
of structures and structure members).
- The Windows APIs have been modified to work in 64-bits. There are, however, a small
number of new APIs to handle the extra requirements of 64-bit operation. These include:-
GetClassLongPtr
GetWindowLongPtr
SetClassLongPtr
SetWindowLongPtr
Note that just as in Win32, you can make your application with either the ANSI or the
Unicode version of the APIs. See Writing Unicode programs.
Differences between x86 and x64 processors top
The main differences are the expanded register range, some changes to instructions, and the use of
RIP-relative addressing. The notes below refer to the AMD64 in 64-bit mode. In this mode the
AMD64 can also run 32-bit executables naturally.
Registers top
The AMD64 adds several new registers to those available in the 86 series of processors, and
also adds new ways to address the existing registers.
- The EAX,EBX,ECX,EDX,ESI,EDI,EBP and ESP "general purpose" registers are all enlarged to 64-bits.
The enlarged registers are accessed using RAX,RBX,RCX,RDX,RSI,RDI,RBP and RSP
- You can still access the low dword of these registers (ie. the least significant 32 bits) by using
the existing names EAX,EBX,ECX,EDX,ESI,EDI,EBP and ESP.
- You can still access the lowest word of these registers (ie. the least significant 16 bits) by using
the existing names AX,BX,CX,DX,SI,DI,BP and SP.
- You can still access the first byte of RAX,RBX,RCX and RDX (ie. the least significant 8 bits)
by using the existing names AL,BL,CL,DL as in the 86 processor. But you can now also address
the first byte of the "index" registers by using SIL,DIL,BPL and SPL. So for example SIL is
the least significant 8 bits of the index register RSI.
- You can still access the second byte of RAX,RBX,RCX and RDX (bits 8 to 15) by using
the existing names AH,BH,CH,DH as in the 86 processor. However, the opcodes for this have been
altered in the AMD64 processor. They now clash with the opcodes required to address the
byte versions of the extended registers R8 to R15. So you cannot use AH,BH,CH,DH and
R8B to R15B in the same instruction.
- There are eight new 64-bit registers (the "extended registers") named R8 to R15.
- The low dword of these registers (ie. the least significant 32 bits) can be addressed
using the R8D to R15D forms.
- The low word of these registers (ie. the least significant 16 bits) can be addressed
using the R8W to R15W forms.
- The first byte of these registers (ie. the least significant 8 bits) can be addressed
using the R8B to R15B forms.
- There are 8 new XMM (128-bit) registers named XMM8 to XMM15.
- The 64-bit MMX registers (MM0 to MM7) are still available. As in the 86 processor they are
also used as floating point registers (ST0 to ST7) for the x87 floating point instructions.
- The instruction pointer is now in the 64-bit RIP register.
Instructions top
- There are some instructions which are not available in the AMD64. The opcodes are now
used for other purposes. The full list is contained in the AMD64 manuals, but includes
AAA, AAD, AAM, AAS, DAA and PUSH and POP operations using CS,DS,ES and SS.
- Instructions are enlarged to allow for the new registers and register forms of address,
for example:-
MOV RAX,immediate ;move a 64-bit number into the 64-bit register
JRCXZ >L1 ;if RCX is zero jump forward to L1
- The string instructions are now enlarged to allow for 64-bit addressing for, example:-
LODSB ;now equivalent to MOV AL,[RSI] then INC RSI
LODSW ;now equivalent to MOV AX,[RSI] then ADD RSI,2
LODSD ;now equivalent to MOV EAX,[RSI] then ADD RSI,4
LODSQ ;new! equivalent to MOV RAX,[RSI] then ADD RSI,8
CMPSB ;now equivalent to CMP B[RSI],B[RDI] then INC RSI,RDI
CMPSQ ;new! equivalent to CMP Q[RSI],Q[RDI] then ADD RSI,8 ADD RDI,8
MOVSW ;now equivalent to MOV W[RDI],W[RSI] then ADD RSI,2 ADD RDI,2
MOVSQ ;new! equivalent to MOV Q[RDI],Q[RSI] then ADD RSI,8 ADD RDI,8
SCASD ;now equivalent to CMP [RDI],EAX then ADD RDI,4
SCASQ ;new! equivalent to CMP [RDI],RAX then ADD RDI,8
STOSQ ;new! equivalent to MOV [RDI],RAX then ADD RDI,8
The repeat prefixes REP, REPZ and REPNZ use RCX rather than ECX.
The loop instructions LOOP, LOOPZ and LOOPNZ use RCX rather than ECX.
The table look-up instruction XLATB uses RBX rather than EBX.
- Apart from the above, the only new instruction of any note usable by programmers is MOVSXD
which can move 32-bits of data from a register or from memory into a 64-bit register, sign extending
bit 31 into all higher bits. There are also a handful of new system instructions.
- In the AMD64, each PUSH and POP instruction moves the stack pointer by 8 bytes instead
of 4 bytes as in the 86 processor. This means that PUSH 32-bit register is no longer
a recognised instruction on the AMD64. To help with compatibility of source code, GoAsm treats (for example) PUSH EAX
as equivalent to PUSH RAX. In /x86 mode, GoAsm treats PUSH RAX as equivalent to PUSH EAX.
So it does not really matter which you use.
- PUSH immediate on the AMD64 takes a 32-bit immediate (number) value and sign extends bit 31
into all higher bits. There is no single instruction capable of taking a 64-bit immediate value and
PUSHing that onto the stack. For this reason PUSH ADDR THING is not a recognised instruction
on the AMD64 (the offset value is treated as an immediate). The problem here is that the actual
immediate value of any particular offset is unknown until link-time, and at assemble-time it is
impossible for the assembler to know whether the offset is above 7FFFFFFFh and so would
be affected by the sign extension. Therefore in GoAsm, PUSH ADDR THING is actually coded as:-
PUSH RAX
MOV [RSP],ADDR THING
- The 3DNow! instructions are still available in the AMD64. It's not clear whether
these instructions are now available on processors supporting Intel EM64T technology.
RIP-Relative addressing top
Some instructions in the AMD64 processor which address data or code, use RIP-Relative addressing
to do so. The relative address is contained in a dword which is part of the instruction. When
using this type of addressing, the processor adds three values: (a) the contents of the dword
containing the relative address (b) the length of the instruction and (c) the value of RIP (the
current instruction pointer) at the beginning of the instruction. The resulting value is then
regarded as the absolute address of the data and code to be addressed by the instruction. Since
the relative address can be a negative value, it is possible to address data or code earlier
in the image from RIP as well as later. The range is roughly ±2GB, depending on the
instruction size. Since relative addressing cannot address outside this range, this is the
practical size limit of 64-bit images.
RIP-relative addressing happens "behind the back" of the user. The processor uses it if the
opcodes contain certain values (in the ModRM byte, the Mod field equals 00 binary, and the r/m
field equals 101 binary). You cannot control this except by changing the type of
instructions you use. Generally here are the rules which govern whether or not an instruction
uses RIP-relative addressing:-
- Addresses in data cannot use RIP-relative addressing since the value of RIP cannot be
known at the time when those addresses are set. Instead, an absolute address for insertion
is calculated at link-time. So for example the following instructions do not use
RIP-relative addressing but instead use absolute addresses:-
MyDataLabel1 DQ MyDataLabel3 ;address of data label
MyDataLabel2 DQ MyCodeLabel ;address of code label
MyDataLabel3 DQ $ ;using current data pointer
MyDataLabel4 DD MyDataLabel3 ;address of data label
MyDataLabel5 DT MyCodeLabel ;address of code label
MyDataLabel6 DD $ ;using current data pointer
Note that in practice, the absolute address is contained in a dword and not in a qword. This is why
in the above examples data and code addresses can be contained within a dword data declaration.
This restriction is feasible because the practical image size is limited to 4GB anyway because
of the restrictions imposed by RIP-relative addressing.
- Offsets converted to immediate values either at assemble-time or at link-time use
absolute addressing rather than relative addressing. For example the following instructions
do not use RIP-relative addressing but instead use absolute addresses:-
MOV RAX,ADDR MyDataLabel3 ;address of data label put in register
MOV MM0,ADDR MyCodeLabel ;address of code label put in register
MOV Q[RSP],ADDR MyDataLabel3 ;address of data label put in memory location
MOV Q[RSP],ADDR MyCodeLabel ;address of code label put in memory location
GoAsm actually codes MOV RAX,ADDR MyDataLabel3 and similar instructions using the shorter
LEA instruction, which does use RIP-relative addressing.
- Here are examples of other instructions which use RIP-relative addressing:-
MOV RAX,[MyDataLabel3+55h] ;address of data label
RCL Q[MyDataLabel3],1 ;address of data label
MOV Q[MyDataLabel3],20h ;address of data label
PAVGUSB MM3,[MyDataLabel3] ;a 3DNow! instruction
CALL ExitProcess ;address of code label (system API)
JMP InternalCodeLabel ;address of code label inside the module
CALL InternalCodeLabel ;address of code label inside the module
CALL ExternalCodeLabel ;address of code label outside the module
PUSH [MyData] ;saving the contents of a data label
POP [MyData] ;restoring the contents of a data label
Note in the case of an external call, the relative address points to the Import
Address Table. Since the table is now enlarged to 64-bits, it is possible to call a code label
anywhere in memory.
- LEA uses RIP-relative addressing, for example:-
LEA RBX,MyDataLabel3 ;load into RBX address of data label
- RIP-relative addressing is not used where the data or code label is supplemented by
an index register. Although this may seem odd, the reason appears to be that adding
information about the register to the opcodes means that the processor can no longer
recognise the instruction as one which uses RIP-relative addressing (in the ModRM byte,
the Mod field no longer equals 00 binary, and the r/m field no longer equals 101 binary).
This means that the following instructions use absolute addresses rather than RIP-relative
ones:-
MOV RAX,[ESI+MyData]
RCL Q[EBX+MyData],1
MOV Q[RSI*2+MyData],44444444h
PAVGUSB MM3,[R12+MyData]
LEA RBX,MyData+RSI
CALL [MyCall+RDI]
JMP [MyJump2+RDI]
PUSH [MyCall+RSI]
POP [MyCall+R12]
Bearing in mind that the image size is limited to 4GB by the above arrangements, it might be
thought that the advantages of RIP-relative addressing are somewhat limited. This seems to
be the case. It appears that the only advantage is that it lessens the number of relocations
which would need to be carried out by the loader if a DLL is loaded at an address which is
unexpected. The loader then would need to adjust all absolute addresses to suit the actual
image base, but relative addresses would not have to be altered since they refer to other
parts of the virtual image of the executable itself. However, it is good practice for the
programmer to choose a suitable image base at link-time to avoid the need for relocations in
a DLL in the first place. A good example of this is the system DLLs themselves. They all
have a different image base which effectively avoids any prospective clashes of the image
in memory which would require relocation at load-time.
Call address sizes top
In 64-bit assembly, a simple call to a code label eg.
CALL CALCULATE
will be coded as an E8 RIP-relative call, using a dword to provide the offset from RIP.
The destination of this call might be an internal code label (ie. a procedure or function
within the executable itself). Or it might be to an external code label, such as
an API in a system Dll or to a code label exported by another exe or Dll. The first
destination of a call to an external code label is to the Import Address Table which
is part of the executable itself. This table is written over by the loader when the
executable starts. Therefore during run-time the table contains the absolute addresses
in virtual memory of the eventual destination of the call. In a 64-bit executable,
the table contains 64-bit values, so the E8 RIP-relative call is capable of calling a procedure
or function anywhere in memory.
Calls to memory addresses either held in a label, or in registers, or in
memory pointed to by registers, however, are dealt with in a different way. They are
not channelled through the Import Address Table. These calls must also permit the
destination of the call to be anywhere in memory. In order to achieve this they must
themselves use 64-bit absolute addresses. Examples of these types of calls are:-
CALL RAX
CALL EAX ;codes the same as CALL RAX
CALL [Table+8h]
CALL [RSI]
CALL [ESI] ;codes the same as CALL [RSI]
Here you need to be careful that you are in fact giving a qword to the call, and not just a dword.
See some pitfalls to avoid when converting existing source code.
Changes to Windows data types
Here is a list of the changes to data types between 32 and 64-bits:-
All handles now qwords not dwords
eg.
HACCEL, HINSTANCE, HBRUSH, HBITMAP
HCOLORSPACE, HCURSOR, HDC, HFONT
HICON, HINSTANCE, HKEY, HLOCAL
HMENU, HMODULE, HPEN, HPALETTE, HWND
(and others starting with H)
exceptions:- HRESULT, HFILE which remain dwords, and HALF_PTR (see below)
All pointers now qwords not dwords
eg.
LPCSTR, LPCTSTR, LPLONG, LPSTR
(and others starting with LP)
PBOOL, PHANDLE, PHKEY, PVOID
(and others starting with P)
DWORD_PTR, ULONG_PTR, UINT_PTR
(and others ending with _PTR)
and LRESULT
exceptions:- HALF_PTR, and UHALF_PTR which are now dwords instead of a word
and POINTER_32 which remains a 32-bit pointer
WPARAM and LPARAM now qwords not dwords
Here is a list of the data types which remain the same:-
ATOM remains a word
BOOL remains a dword
CHAR remains a byte
DWORDLONG remains a qword
COLORREF remains a dword
INT remains a dword
INT32 remains a dword
INT64 remains a qword
LANGID remains a word
LCTYPE remains a dword
LCID remains a dword
LGRPID remains a dword
LONG remains a dword
LONG32 remains a dword
LONG64 remains a qword
LONGLONG remains a qword
POINT remains two dwords
RECT remains four dwords
SHORT remains a word
UINT remains a dword
UINT32 remains a dword
UINT64 remains a qword
ULONG remains a dword
ULONG32 remains a dword
ULONG64 remains a qword
ULONGLONG remains a qword
USHORT remains a word
Using the switched type indicator
The above change of a data type may require a corresponding change to a type indicator. The letter P is reserved as a type indicator in all situations when
GoAsm might expect to find one. So you can have this switch:-
#if x64
P = 8
#else
P = 4
#endif
P can be switched to the equivalent of any of the pre-defined type
indicators that is B, W, D, Q or T. In this case it is switched either
to Q (value 8) or to D (value 4). Therefore you can control the size
of the instruction with it, for example:-
MOV P[RDI],0 ;zero a qword at RDI if 64-bit, dword at EDI if 32-bit
LOCAL POINTERS[10]:P ;make 80 byte local pointer buffer if 64-bit, 40 byte if 32-bit
Alignment requirements
The requirements of the system in Win64 for correct alignment of the stack pointer,
data, and structure members are much stricter than in Win32. Wrong alignment
can cause as best a loss of performance and at worst, an exception or program exit.
Stack alignment
The stack pointer (RSP) must be 16-byte aligned when making a
call to an API. However, this is organised automatically by GoAsm if you use
INVOKE see automatic stack alignment.
Data alignment
All data must be aligned on a "natural boundary". So a byte can be byte-aligned, a word
should be 2-byte aligned, a dword should be 4-byte aligned, and a qword should be 8-byte
aligned. A tword should also be qword aligned. GoAsm deals with alignment automatically
for you when you declare local data (within a FRAME or USEDATA area). But you will need
to organise your own data declarations to ensure that the data is properly aligned. The
easiest way to do this is to declare all qwords first, then all dwords, then all words
and finally all bytes. Twords (being 10 bytes) would put out the alignment for later
declarations, so you could declare all those first and then put the data back into
alignment ready for the qwords by using ALIGN 8.
As for strings, in accordance with the above rules, Unicode strings must be
2-byte aligned, whereas ANSI strings can be byte aligned.
When structures are used they need to be aligned on the natural boundary of the
largest member. All structure members must also be aligned properly, and the structure
itself needs to be padded to end on a natural boundary (the system can write in
this area). Because of the importance of this, from Version 0.56 (beta), GoAsm aligns structures
automatically for you. See automatic alignment and padding of
structures and structure members for more.
Windows structures in 64-bit programming top
Windows often uses structures to send and receive information using the APIs. In 64-bits
these structures are likely to be significantly different from their 32-bit counterparts
because of the enlargement of many data types to 64-bits.
See changes to Windows data types.
Take for example the WNDCLASS structure which is used when you want to register a window class:-
WNDCLASS STRUCT
style DD 0 ;+0 window class style
DD 0 ;+4 padding for next
lpfnWndProc DQ 0 ;+8 pointer to Window Procedure
DD 0 ;+10 no. of extra bytes to allocate after structure
DD 0 ;+14 no. of extra bytes to allocate after window instance
hInstance DQ 0 ;+18 handle to instance containing window procedure
hIcon DQ 0 ;+20 handle to the class icon
hCursor DQ 0 ;+28 handle to the class cursor
hbrBackground DQ 0 ;+30 identifies the class background brush
lpszMenuName DQ 0 ;+38 pointer to resource name for class menu
lpszClassName DQ 0 ;+40 pointer to string for window class name
ENDS
A number of the members are now qwords, whereas previously they were dwords as you can
see from the 32-bit version below. The class style at offset +0h remains a dword, but then
in the 64-bit version, padding of four bytes is required because the next member is a
qword. This complies with the requirement that structure members are aligned on their natural
boundary. A qword is used to provide space for the pointers firstly to the window procedure
itself at +8h, to menu name at +38h and to the window class name at +40h. This is despite
the fact that 64-programming as implemented by Win64 for the AMD64 processor only uses 32-bit
pointers where those pointers give the addresses of internal data. Presumably the reason
for this is that the same structures as being used here as are used for the IA64 family of
processors (which use 64-bit pointers to internal data). Handles in the structure are also
enlarged to 64-bits.
WNDCLASS STRUCT
style DD 0 ;+0 window class style
lpfnWndProc DD 0 ;+4 pointer to Window Procedure
DD 0 ;+8 no. of extra bytes to allocate after structure
DD 0 ;+C no. of extra bytes to allocate after window instance
hInstance DD 0 ;+10 handle to instance containing window procedure
hIcon DD 0 ;+14 handle to the class icon
hCursor DD 0 ;+18 handle to the class cursor
hbrBackground DD 0 ;+1C identifies the class background brush
lpszMenuName DD 0 ;+20 pointer to resource name for class menu
lpszClassName DD 0 ;+24 pointer to string for window class name
ENDS
Here is another example, this time the structure DRAWITEMSTRUCT. First, lets have a look at the 32-bit version in the form you would find it in the SDK:-
UINT CtlType ;+0
UINT CtlID ;+4
UINT itemID ;+8
UINT itemAction ;+C
UINT itemState ;+10
HWND hwndItem ;+14
HDC hDC ;+18
RECT rcItem ;+1C
ULONG_PTR itemData;+2C
(total size of structure is 30h bytes)
In 64-bits this structure becomes:-
UINT CtlType ;+0
UINT CtlID ;+4
UINT itemID ;+8
UINT itemAction ;+C
UINT itemState ;+10
padding dword
HWND hwndItem ;+18 HDC hDC ;+20
RECT rcItem ;+28
ULONG_PTR itemData;+38
(total size of structure is 40h bytes)
It is also a requirement that the structure is enlarged so that it ends on
the natural boundary of its largest member. This is achieved by adding the
necessary padding at the end of the structure. So PAINTSTRUCT becomes:-
PAINTSTRUCT STRUCT
DQ 0 ;+0 hDC
DD 0 ;+8 fErase
left DD 0 ;+C left )
top DD 0 ;+10 top ) RECT
right DD 0 ;+14 right )
bottom DD 0 ;+18 bottom )
DD 0 ;+1C fRestore
DD 0 ;+20 fIncUpdate
DB 32 DUP 0 ;+24 rgbReserved
DD 0 ;+44 padding to being total size to 72 bytes
ENDS
In practice it was found that the system wrote to the area of padding at +44h when
using PAINTSTRUCT in certain circumstances. This shows the importance of complying
with these rules (otherwise you could find that data after the structure could be
written over).
Note that the beginning of structures must be aligned on the natural boundary
of the largest member as well. All the above rules ensure, therefore, that qwords
in the structure are always qword aligned.
Automatic alignment and padding of structures and structure members
As we have seen correct alignment of structures and structure members is crucial
for proper operation of 64-bit code. Unfortunately the Windows header files
containing the structure definitions do not necessarily contain the necessary
padding to achieve such alignment.
So from Version 0.56 (beta), GoAsm does this work automatically for you as follows:-
- GoAsm always aligns the structure itself to the correct data boundary.
- GoAsm always pads if necessary to ensure that structure members are on their
natural boundary. So in the MSG structure example below, the padding at +0Ch
could be left out. It would be inserted automatically.
- GoAsm always adds padding at the end of a structure so that the structure
ends on a natural boundary. So in the example below the padding at +2Ch could be
left out. It would be inserted automatically.
- The symbols created when using a structure are automatically adjusted to suit
the alignment and padding which is applied.
MSG DQ 0 ;+0h hWnd
DD 0 ;+8h message
DD 0 ;padding for next
DQ 0 ;+10h wParam
DQ 0 ;+18h lParam
DD 0 ;+20h time
DD 0 ;+24h 1st part of point structure
DD 0 ;+28h 2nd part of point structure
DD 0 ;+2Ch padding to bring the overall size to 48 bytes
You can see what alignment and padding GoAsm has added to your source code if you
specify /l in GoAsm's command line. This will create a list file. Also you can
view the effect in a debugger.
Structures - the overall picture
- If you are writing source code for both 32 and 64-bit versions of your program, this
will be made much easier if you use conditional assembly to switch the correct structures
at assemble-time, and then instead of filling the structures using the offset values, you
fill them using the member names. Using this method, GoAsm finds the correct offset for
you automatically. This technique has been used in the demonstration file Hello64World 3.
- You can use conditional assembly to switch whole banks of structures in one go. These
can be contained in include files containing 32-bit structures and 64-bit structures
respectively.
- Since GoAsm aligns and pads the structures automatically for you, you can use
the 64-bit structure definitions already available in include files, or you can make
your own from the Windows header files using Wayne J Radburn's
xlatHinc utility.
Choice of register top
-
One main thing to remember is that all Windows handles are 64-bits so the APIs will provide them
in RAX rather than in EAX.
-
The same goes for Windows pointers. For example you may ask Windows for some memory. The address
of the memory will be returned in RAX and not in EAX.
So this means that:-
ARG 4h,3000h,EDX,0
INVOKE VirtualAlloc ;reserve and commit edx bytes of read/write memory
MOV [EAX],66666666h ;insert a number at the beginning of that memory
is bad 64-bit coding, whereas
ARG 4h,3000h,EDX,0
INVOKE VirtualAlloc ;reserve and commit edx bytes of read/write memory
MOV [RAX],66666666h ;insert a number at the beginning of that memory
is good.
- Since all pointers to internal data and code labels are 32-bits, in theory it is possible
to use the 32-bit versions of the general purpose registers (EAX to ESP) for all such pointers
so for example, you could use MOV [ESI],AL instead of MOV [RSI],AL.
However, I do advise against this for the following five reasons:-
- It means you have to keep track of which pointers are internal ones and which are
external ones. You must allow for the external ones being 64-bits.
- You may need two sets of procedures which are oft-used in your program, one using
32-bit register pointers and one using 64-bit register pointers.
- The string instructions such as LODSB, MOVSW, STOSD, CMPSQ and SCASB use RSI and RDI
in a 64-bit program rather than ESI and EDI. And the repeat prefixes REP, REPZ and REPNZ
use RCX instead of ECX.
- Using the 32-bit versions of these instructions in 64-bit program codes one opcode
larger than the 64-bit version. This is because in a 64-bit program, MOV [RSI],AL is
the default and to convert this to MOV [ESI],AL requires an 67h override byte.
- You can still use the same source code to make both 32-bit and 64-bit programs provided
you only use the general purpose registers, RAX to RSP. This is because when you use the /x86
switch with GoAsm these registers are automatically regarded as EAX to ESP instead.
You can automate the required changes to existing 32-bit code using AdaptAsm.
- If you need to use the R8 to R15 registers, remember that R8 to R11 are volatile (they will
not be maintained by the APIs). If you use the non-volatile R12 to R15 registers within window
procedures and callback procedures then you must ensure that they are restored after use. This
can be done by using PUSH at the beginning and POP at the end of the procedure which uses them, or
by using the USES statement.
- When passing parameters to an API using INVOKE, you may need to take into account that
in the FASTCALL calling convention the parameters have to be sent to the API in the RCX,RDX,R8 and
R9 registers. Therefore you would not wish to pass parameters in registers which will be overwritten
by GoAsm (you will get an error message if you try to do this).
For example this is bad and will show an error:-
INVOKE MessageBoxW,RDX,R8,R9,R10
It's bad because if it were allowed, it would translate to:-
MOV R9,R10
MOV R8,R9
MOV RDX,R8
MOV RCX,RDX
so it can be seen that the contents of the registers are being overwritten before they
are being used to establish the parameters.
Better would be:-
INVOKE MessageBoxW,R10,R9,R8,RDX
Which translates to:-
MOV R9,RDX
MOV RDX,R9
MOV RCX,R10
Note that GoAsm does not bother to code MOV R8,R8
Even better would be:-
INVOKE MessageBoxW,RCX,RDX,R8,R9
which requires no further code to pass the parameters since they are already in the correct
registers. So this is very efficient code.
See also some tips to reduce the size of your code which has some additional
implications for your choice of registers
and also some pitfalls to avoid when converting existing source code.
Zero-extension of results into 64-bit registers top
Take care when mixing the 64-bit registers and their 32-bit counterparts because the processor
can change the contents of the whole 64-bit register when this is
not obvious. This is because when writing results to a 32-bit register the processor will
zero-extend the result into the whole 64-bits of the register. So, for example:-
MOV RAX,-1 ;fill RAX with 0FFFFFFFF FFFFFFFFh
AND EAX,0F0F0F0Fh ;(apparently) work only on EAX
but the processor will zero extend the result into RAX, in other words it will zero
the whole of the high dword of RAX. The result in RAX is 00000000 0F0F0F0Fh not 0FFFFFFFF 0F0F0F0Fh as
expected. This happens irrespective of the value of bit 31 of RAX (this is not the same as sign-extension).
A similar thing happens when using other instructions. Here is an example with XOR:-
MOV RAX,-1 ;fill RAX with 0FFFFFFFF FFFFFFFFh
XOR EAX,EAX ;(apparently) zero EAX
The actual result in RAX is zero.
And it also happens with the mov instruction for example
MOV RCX,1111111111111111h
MOV ECX,88888888h
The result is RCX=88888888h
You can take advantage of zero-extension in various ways. Some examples are given in
some tips to reduce the size of your code. Take also this example, where
the structure RECT (which is four dwords) contains values which must be passed to the API MoveWindow
as qwords:-
MOV RBX,ADDR RECT
MOV EAX,[EBX] ;get x-pos
MOV ECX,[EBX+4] ;get y-pos
MOV EDX,[EBX+8] ;get right
SUB EDX,EAX ;get width
MOV R8D,[EBX+0Ch] ;get bottom
SUB R8D,ECX ;get height
INVOKE MoveWindow,[hWnd],RAX,RCX,RDX,R8,0
Here only 32-bit registers are used to extract the information from the RECT structure, but
we know that the high part of the 64-bit versions of those registers are set to zero.
It is possible that there is a performance loss in relying on zero-extension. Some of the
documentation suggests that the processor has to carry out an additional operation to zero
the high bits of the register.
Sign-extension of results into qwords top
You may wonder about the difference between the following instructions:-
MOV D[THING],12345678h
MOV Q[THING],12345678h
These code differently and do different things. The dword version places the value 12345678h
into the dword at the label THING as you would expect. The qword version does the same, but
also zeroes the dword at THING+4. This is because it sign-extends the result into
the qword at the label THING. So if the high bit is set, the qword version will fill THING+4
with 0FFFFFFFFh. In other words, the 32-bit value in these instructions are regarded as
signed numbers, and written to memory accordingly.
MOV D[THING],12345678h ;THING is now 12345678h (as dword)
MOV Q[THING],12345678h ;THING is now 12345678h (as qword)
MOV D[THING],82345678h ;THING is now 82345678h ie. -7DCBA988h (as dword)
MOV Q[THING],82345678h ;THING is now 0FFFFFFFF 82345678h ie. -7DCBA988h (as qword)
The same happens if you use a register to address the data area for example:-
MOV RSI,ADDR THING
MOV D[RSI],12345678h ;THING is now 12345678h (as dword)
MOV Q[RSI],12345678h ;THING is now 12345678h (as qword)
MOV Q[RSI],82345678h ;THING is now 0FFFFFFFF 82345678h ie. -7DCBA988h (as qword)
Note that you can't put more than 4 bytes into memory directly using the MOV instruction even
though you are using 64-bit code, so this shows an error:-
MOV Q[THING],123456789ABCDEFh
Instead, to achieve this result you would use the following code:-
MOV RAX,123456789ABCDEFh
MOV [THING],RAX
Automatic stack alignment top
The stack pointer (RSP) must be 16-byte aligned when making a call to an API. With some
APIs this does not matter, but with other APIs wrong stack alignment will cause an exception.
Some APIs will handle the exception themselves and align the stack as required
(this will, however, cause performance to suffer). Other APIs (at least on early builds
of x64) cannot handle the exception and unless you are running the application under debug
control, it will exit.
Because of this requirement, the Win64 documentation states that you can only call an API
within a stack frame. This is because it is assumed that only within a stack frame can the
stack be guaranteed to be aligned properly. A call out of the stack frame will misalign the
stack by 8 bytes.
This requirement is very restrictive to assembler programmers, and causes compilers a big
headache. GoAsm's solution to this problem is to insert special coding before and after each
API call (when INVOKE is used) to ensure that the stack is always properly aligned at the time
of the call. This liberates the assembler programmer, and means that:-
- Calls to APIs (using INVOKE) can be made anywhere in your code. They can be made from
procedures called by other procedures without worrying about the stack pointer.
- PUSHes and POPs can be used in the usual way to save and restore registers, memory addresses
and contents of memory without having to worry that this puts the stack out of alignment.
- You can use the same source code both for 32-bit and 64-bit versions of your application
(there is no requirement for stack alignment in 32-bits).
The overhead for aligning the stack at the time of each API call is an additional nine bytes per
API, which seems a small price to pay for the advantages gained. To keep down the size of the code as
much as possible, GoAsm takes a number of opportunities to optimise the code particularly
when inserting the parameters. See some optimisation done by GoAsm for
details. See also coding to achieve automatic stack alignment.
Using the same source code for both 32 and 64-bits top
The GoAsm manual describes the use of ARG and INVOKE in the section dealing
with calls to Windows APIs in 32-bits and 64-bits and the use
of FRAME...ENDF in the section dealing with
callback stack frames in 32-bits and 64-bits. GoAsm's ARG and INVOKE
and FRAME...ENDF constructs effectively deal with the changes in the calling convention in 64-bit
programming.
Bringing together all those considerations and also those set out above, it is perfectly possible
to use the same source code to create executables for both 32-bit and 64-bit platforms.
To recap, here are the rules which must be followed to do this:-
- When calling APIs use INVOKE in your code instead of CALL.
- When passing parameters to APIs use ARG in your code instead of PUSH, alternatively
give the parameters after INVOKE.
- Use FRAME .. ENDF in your code when using LOCAL data or picking up parameters sent to a window
procedure (or other similar callback procedure).
- If you want to use the new registers R8-R15, XMM8-XMM15, or the new 8, 16 and 32-byte addressed
registers, make sure they are used only within switched 64-bit source code using conditional
assembly.
- Use the 64-bit form of the general purpose registers (RAX,RBP,RBX,RCX,RDX,RDI,RSI, and RSP)
for pointers. When GoAsm assembles for 32-bit, it will automatically reduce these
registers to their 32-bit counterparts.
- If you have used PUSHFD and POPFD to save and restore the flags, change this to
PUSHF and POPF or PUSH FLAGS and POP FLAGS.
- Ensure that structures, data sizes, and type indicators are correct for 32/64-bit use, if necessary
by using conditional assembly.
- Use /x64 in the command line to create a 64-bit executable, and /x86 in the command line
to create a 32-bit executable.
The "Go" tools will do the rest of the work.
Note that x86 should not be used in the command line for Win32 source code (use it only for
32/64-bit switchable source code).
See the file Hello64World3 for example source code which can make
either a simple Win32 "Hello World" Window program or a Win64 one.
Converting existing 32-bit code to 64-bit top
Bringing together all the above considerations, this is what you need to do to convert existing
32-bit source code to 64-bit source.
- Change all CALLs to APIs to INVOKE. Do not change any CALLs to non-APIs.
- If you have used PUSH to send parameters to an API in your 32-bit source, change this to
ARG. Do not use ARG for any other PUSHes.
- Change all the 32-bit general purpose registers used as pointers (that is, within
square brackets) to their 64-bit counterparts (RAX,RBP,RBX,RCX,RDX,RDI,RSI, and RSP). This
will keep your code shorter, and ensure that pointers to external data work properly.
Remember also to use only RSI, RDI and RCX with your string instructions and repeat prefixes.
See choice of registers.
- Ensure that registers which contain system handles and other values provided by the system
are changed to their 64-bit counterparts (RAX,RBP,RBX,RCX,RDX,RDI,RSI, and RSP).
- Adjust all other registers use as required. Generally for other use, the existing
registers will work perfectly well, but do not mix the use of 32-bit and 64-bit registers
because of zero-extension of results. There is no need to change
PUSHes and POPs of registers. These changes are done automatically by GoAsm because the opcodes
are the same (for example PUSH EAX is regarded the same as PUSH RAX and vice versa).
- Ensure that structures, data sizes, and type indicators are correct for 64-bit use.
- Check that your JECXZ instructions are changed to JRCXZ if appropriate.
- Since 64-bit tends to be a little larger than 32-bit code, when you
re-assemble your code using the /x64 switch, you may find that some
short jumps have to be re-organised.
AdaptAsm can do some of the above work for you.
Using AdaptAsm.exe to help with the conversion top
AdaptAsm comes packaged with GoAsm and I originally wrote it to help to convert
source code used for other assemblers to GoAsm syntax. I have now extended it to
help towards the conversion of 32-bit source code to 64-bit source code. This works
both on GoAsm source code and also source code for other assemblers.
For full details of AdaptAsm's other rôles see
the GoAsm manual.
You use AdaptAsm from the command line using the following:-
AdaptAsm [command line switches] inputfile[.ext]
If no input extension is specified, .asm is assumed.
If no output extension is specified, .adt is assumed
The command line switches are:-
/h=this help
/a=adapt a386 file
/m=adapt masm file
/n=adapt nasm file
/fo=specify output path/file eg. /fo GoAsm\adapted.asm
/l=create log output file
/o=don't ask before overwriting input file
/x64=adapt file for 64-bits
|
What AdaptAsm does when helping to adapt a file to 64-bits using the /x64 switch
|
CALLs to APIs are changed to INVOKE (CALLs to non-APIs are not affected).
AdaptAsm does this by looking at lists of APIs in ".h.txt" files in the same
folder as AdaptAsm.exe. See the ".h.txt" files for
more information about these files.
This works with all types of calls even if enclosed in square brackets and
even if dependent on a define (equate) or a switch, for example:-
CALL ExitProcess ;changed to INVOKE
CALL [ExitProcess] ;changed to INVOKE
CALL INTERNAL_PROC ;not changed
CALL SendMessage ;changed to INVOKE
CALL SendMessageA ;changed to INVOKE
CALL SendMessageW ;changed to INVOKE
CALL SendMessage##AW ;changed to INVOKE
|
Changing PUSH to ARG for the parameters sent to the API. AdaptAsm does this by
counting the correct number of parameters back from the CALL and comparing this with
the correct number of parameters in the lists of APIs in ".h.txt" files in the same
folder as AdaptAsm.exe. See the ".h.txt" files for
more information about these files.
Here are some simple examples:-
PUSH EBX,0,1100h,[hMessTV] ;PUSH is changed to ARG (and EBX changed to RBX)
CALL SendMessageA ;CALL is changed to INVOKE
PUSH EBX,0 ;PUSH is changed to ARG (and EBX changed to RBX)
PUSH 1100h ;PUSH is changed to ARG
PUSH [hMessTV] ;PUSH is changed to ARG
CALL SendMessageA ;CALL is changed to INVOKE
You may have preserved registers across API calls and these are unaffected, for example:-
PUSH EAX ;PUSH not changed (but EAX changed to RAX)
PUSH EBX,0,1100h,[hMessTV] ;PUSH is changed to ARG (and EBX changed to RBX)
CALL SendMessageA ;CALL is changed to INVOKE
POP EAX ;POP not changed (but EAX changed to RAX)
However, if you have mixed these two uses of PUSH AdaptAsm will show an error by
changing the PUSH to ARG and noting the problem in the log file:-
PUSH EAX,EBX,0,1100h,[hMessTV] ;PUSH is changed to ARG (too many parameters)
CALL SendMessageA ;CALL is changed to INVOKE
POP EAX ;restore eax register
If AdaptAsm cannot find all the expected parameters it shows an error by changing
the CALL to INVOKE and noting the problem in the log file, for example:-
CALL INTERNAL_PROC ;not changed
PUSH 0,1100h,[hMessTV] ;PUSH is changed to ARG
CALL SendMessageA ;CALL is changed to INVOKE (too few parameters)
This means that this type of thing which could be done in 32-bits, will show up as
as error by AdaptAsm (and rightly so, since in 64-bit assembler each CALL must
immediately follow the parameters):-
PUSH 0,EAX,14Eh,[hComboSev] ;14Eh=CB_SETCURSEL
PUSH 0,EAX,151h,[hComboSev] ;151h=CB_SETITEMDATA
CALL SendMessageA
CALL SendMessageA
|
32-bit general purpose registers in square brackets are changed to their 64-bit
counterparts so that they can be used for both 32-bit and 64-bit assembly, for example:-
MOV EAX,[EAX+EBX] ;changed to MOV EAX,[RAX+RBX]
MOV D[EBX*8+EBP],8h ;changed to MOV D[RBX*8+RBP],8h
CALL [EBX] ;changed to CALL [RBX]
INVOKE ExitProcess,[EBX] ;changed to INVOKE ExitProcess,[RBX]
PUSH [EBX] ;changed to PUSH [RBX] or ARG [RBX]
POP [EBX] ;changed to POP [RBX]
|
Where a pointer is used with a 32-bit general purpose register, the register is changed to
its 64-bit counterpart, for example:-
MOV EAX,ADDR THING ;changed to MOV RAX,ADDR THING
CMP ESI,ADDR THING ;changed to CMP RSI,ADDR THING
MOV EBP,OFFSET THING ;changed to MOV RBP,OFFSET THING
LEA EAX,THING ;changed to LEA RAX,THING
|
Although not strictly necessary, for good measure 32-bit general purpose registers after
PUSH, POP and INVOKE are changed to their 64-bit counterparts, for example:-
PUSH EAX,EBX ;changed to PUSH RAX,RBX
POP EBX,EAX ;changed to POP RBX,RAX
INVOKE ExitProcess,EBX ;changed to INVOKE ExitProcess,RBX
|
|
What AdaptAsm does not do (and you need to do by hand)
|
|
AdaptAsm cannot decide for you which register to use in other circumstances. You will
have to decide this on a case-by-case basis see choice of registers
for some guidance on this.
|
|
AdaptAsm does not ensure that structures and data sizes are correct for 64-bit use, nor that
the pointers to structures and strings are properly aligned.
|
The "h.txt" files used by AdaptAsm with the /x64 switch top
These files are text files containing lists of APIs and the number of parameters
required by each API. AdaptAsm looks inside its own folder for such h.txt files.
The "h.txt" files are created from Microsoft header
files using a clever javascript file ApiParamCount.js, written by Leland M George of
West Virginia, who has kindly donated it to the public domain. This js file is shipped
with AdaptAsm together with some ready-made h.txt files containing the most commonly
used APIs. If your program uses APIs declared in other header files you can make your
own "h.txt" files using the js file. There are two ways to use the js file:-
Alternatively you can make your own h.txt file or edit the existing ones. The format
is as follows:-
- The first API name must start at the beginning of the file and subsequent ones
at the beginning of a line.
- New lines are made using carriage return (ascii 13) followed by linefeed (ascii 10).
- A comma immediately follows the API name.
- The number of parameters required by the API immediately follows the comma and
is written as an ascii decimal character. If the API does not take any parameters the number is
zero.
Switching using x64 and x86 in conditional assembly top
As well as switching to 64-bit or 32-bit assembly, specifying /x64 or /x86 in GoAsm's command line
also permits these words to be tested in conditional assembly. So, for example, you can switch
two different generalised window procedures in this way:-
WndProcTable:
#if X64
MOV EAX,ADDR MESSAGES ;give eax the list of messages to deal with
CALL GENERAL_WNDPROC64 ;call the generic message handler (64-bit version)
#else
MOV EDX,ADDR MESSAGES ;give edx the list of messages to deal with
CALL GENERAL_WNDPROC ;call the generic message handler (32-bit version)
#endif
RET
Note that the words "x64" and "x86" are not case sensitive.
Here is another example to switch include files including structures:-
#if X64
#include structures64.inc
#else
#include structures32.inc
#endif
Some pitfalls to avoid when converting existing source code top
- Forgetting that API parameters are always qwords.
Your existing 32-bit source code will have been written on the correct assumption that each
parameter is a dword. For example:-
ARG 4000h,[SYSTEM_INFO+4h],[MEMORY_END]
INVOKE VirtualFree ;decommit a page of memory
In 32-bits this is good coding because there is a dword at [SYSTEM_INFO+4h] (the dword here holds
the systems memory page size (these assumes the structure was filled in using a call to the
GetSystemInfo API).
In 64-bits this is bad because the value at +4h is still a dword, but you are now sending a
qword to VirtualFree and not just a dword. This should be coded as follows instead:-
XOR RAX,RAX ;zero rax
MOV EAX,[SYSTEM_INFO+4h] ;get page size into lower 32-bits of rax
ARG 4000h,RAX,[MEMORY_END]
INVOKE VirtualFree ;decommit a page of memory
Note that in practice, because the MOV EAX line itself zeroes the top part of RAX, you could remove
the first line of this example altogether!
A similar problem arises when interrogating the system and receiving information into data.
Your existing 32-bit code may well look something like this:-
ARG 0,ADDR SIZEOF_WORKAREA,0,48 ;48=SPI_GETWORKAREA (excluding tray)
INVOKE SystemParametersInfoA ;get size of work area into SIZEOF_WORKAREA
Here the call puts a 32-bit value into the dword SIZEOF_WORKAREA which is correct. However
assembling and running the same code in a 64-bit system would overwrite the next dword in
memory as well (a qword is sent not a dword). So you need to enlarge SIZEOF_WORKAREA to
a qword.
- Forgetting that all calls are now to 64-bit values.
This can easily be forgotten when using tables to control movement of execution around
your code. Take the case of a simple table of labels for example:-
DATA
Table DD CODELABEL,2h
CODE
CALL [Table]
or
DATA
Table DD CODELABEL,2h
CODE
MOV RSI,ADDR Table
CALL [RSI]
This will call an 64-bit address with CODELABEL's address in the low dword and 2 in the high
dword. This will produce an error at run-time. The solution for internal calls is to code as
follows:-
DATA
Table DQ CODELABEL,2h
CODE
CALL [Table]
or
DATA
Table DD CODELABEL,2h
CODE
MOV RSI,ADDR Table
XOR RAX,RAX
MOV EAX,[RSI]
CALL RAX
This code ensures that the high dword of the 64-bit address holds zero. This works because
all pointers to internal data and code labels are 32-bits.
- Forgetting that all Windows handles are now 64-bit values.
In Win64, system handles are enlarged to 64-bits so it is unsafe to assume that they
will always fit into 32-bits.
So this means that:-
ARG 32512 ;IDC_ARROW common cursor
INVOKE LoadCursorA,0 ;get in eax, handle to arrow cursor
MOV [WNDCLASS+28h],EAX ;and give to WNDCLASS
is bad 64-bit coding, whereas
ARG 32512 ;IDC_ARROW common cursor
INVOKE LoadCursorA,0 ;get in eax, handle to arrow cursor
MOV [WNDCLASS+28h],RAX ;and give to WNDCLASS
is correct.
- Forgetting that all POPs are now to qwords.
Your existing 32-bit source code may POP into dwords in memory. For example:-
DRAW_RECTANGLE:
PUSH [RECT],[RECT+4] ;save left and top of rectangle
; code to adjust rectangle
; and then draw it
POP [RECT+4],[RECT] ;restore top and left of rectangle for future use
RET
In 64-bits a RECT structure is still 4 dwords just as it was in 32-bits. However
the second POP in the above code would rub out the second dword in the structure
because the POP is in fact 64-bits, not 32-bits.
Correct coding for 64-bits would be:-
DRAW_RECTANGLE:
PUSH [RECT],[RECT+4] ;save left and top of rectangle
; code to adjust rectangle
; and then draw it
POP RAX ;restore top of rectangle for future use
MOV [RECT+4],EAX ;insert dword only
POP RAX ;restore left of rectangle for future use
MOV [RECT],EAX ;insert dword only
RET
Assembling and linking to produce the executable top
To make a 64-bit object file with GoAsm use this command line:-
GoAsm /x64 filename
where filename is the name of your asm file written either as a 64-bit source file or
a 32/64 switchable source file. Use /x86 instead of /x64 when assembling a 32/64 switchable
source file to make a 32-bit version.
The object file created by GoAsm can be sent to GoLink or another linker in the usual way.
GoLink automatically senses whether the object file is 32 or 64-bit and creates the
correct type of executable to suit.
You cannot mix 32-bit and 64-bit object files. GoLink will show an error if you try to
do this.
You do not necessarily need to make 64-bit executables on a 64-bit machine. This is because the
DLL names given to GoLink simply tell the linker that the DLL contains the APIs used by
the application and these tend to be the same between the two platforms. If your application
calls APIs specific to the 64 bit system however, this does not work.
Some optimisation and refinement done by GoAsm top
GoAsm always aims to produce the tightest possible code from your source. In the case of x64,
GoAsm has not yet taken up all opportunities to optimise the code. This is because there are still
some unknowns, such as effects on performance of optimised code on x64.
The optimisations and refinements are listed here to help you when you look at the code produced
by GoAsm in the debugger.
|
GoAsm optimisations and refinements in all code
|
None of these affect the flags or adversely affect performance.
|
|
Additional optimisations and refinements only when INVOKE is used
|
These may affect the flags which does not matter when calling an API. Those that rely on
zero-extension may require another operation from the processor, but it
is assumed that this does not matter when calling an API. It is more important to keep the
code size down.
- A register parameter containing zero is optimised using XOR 32-bit register. This is a
saving of between 7 and 8 bytes over the MOV equivalent.
- A register parameter containing a number (an "immediate") which can fit into 32-bits is
changed to use a 32-bit register, saving between 1 and 5 bytes depending on the register and
the number.
- A register parameter containing -1 is achieved by using OR 64-bit register,-1 saving
6 bytes.
- If the parameter is already in the correct register no further code is emitted
because it is not required.
- The coding to achieve automatic stack alignment and to adjust the
stack for the FASTCALL calling convention is as follows (which one is
used depends on the number of parameters):-
PUSH RSP ;save current RSP position on the stack
PUSH [RSP] ;keep another copy of that on the stack
AND SPL,0F0h ;adjust RSP to align the stack if not already there
;
; parameters dealt with here
;
SUB RSP,20h ;adjust RSP to provide placeholders
CALL TheAPI
ADD RSP,xxh ;get RSP back to correct place for next
POP RSP ;restore RSP to its original value
or
PUSH RSP ;save current RSP position on the stack
PUSH [RSP] ;keep another copy of that on the stack
OR SPL,8h ;adjust RSP to align the stack if not already there
;
; parameters dealt with here
;
SUB RSP,20h ;adjust RSP to provide placeholders
CALL TheAPI
ADD RSP,xxh ;get RSP back to correct place for next
POP RSP ;restore RSP to its original value
|
Some tips to reduce the size of your code top
Note it is possible some of these optimisations may adversely affect performance..
- Using the 64-bit registers (RAX to RSP) as pointers to memory (for example MOV [RSI],AL)
saves a byte over using the 32-bit versions (for example MOV [ESI],AL). This is because in such
instructions a 67h override byte is needed for the 32-bit version.
- The opposite is the case when you use registers to hold immediates (numbers). In those
cases using the enlarged registers (RAX to RSP) and the extended registers (R8 to R15) or any
of the new register addressing methods, adds at least a byte to each instruction. For
example, MOV RAX,23456h is 2 bytes larger than MOV EAX,23456h. The contrast is even greater
using larger numbers which are above 7FFFFFFFh because these have to be coded as full 64-bit
numbers if you use a 64-bit register. So for example MOV RAX,80234560h codes 5 bytes larger
than MOV EAX,80234560h. If the number you wish to move will fit into a byte, then even greater
savings can be achieved, for example MOV AL,88h codes as 2 bytes, but MOV RAX,88h is 10 bytes.
- DEC and INC (with a register) now use two opcodes, whereas in 86 processors they were very
frugal, using only one opcode. But there is still an advantage in using this over SUB register,1
or ADD register,1 which is one byte longer. SUB or ADD can still be used if you need to test
the carry flag after the instruction.
- In 64-bit programming LEA register,Label is 5 opcodes shorter than
MOV register,ADDR Label yet they achieve the same result. In GoAsm source code however, you
can use either since GoAsm automatically uses the shortest form.
- PUSH ADDR THING codes as 9 bytes, whereas if you use LEA RAX,THING followed by PUSH RAX instead,
this is 8 bytes. However, it changes the content of the RAX register.
- Zero a register using XOR. XOR RAX,RAX is 3 bytes, whereas MOV RAX,0 is 10 bytes (because
the instruction takes a 64-bit immediate value (number). However, XOR affects the flags, MOV
does not.
- XOR EAX,EAX is even shorter at 2 bytes and it does zero the whole RAX register. See
zero-extension of results.
- A good way to fill a register with -1, is to use OR register,-1 which in the case
of a 64-bit register is 4 bytes, a saving of 6 bytes over MOV register,-1. However, OR
affects the flags, but MOV does not.
- Compares in the range -80h to +7Fh code as 4 bytes (eg. CMP RDX,-80h to RDX,7Fh) but outside
that range they code as 7 bytes (so eg. CMP RDX,80h is 7 bytes).
- You can still use LEA to do intra-register arithmetic for example LEA RAX,[RAX+RAX*2] which
multiplies RAX by three. This codes as 4 bytes.
See also general tips for programming in GoAsm help.
More information, references and links
top
Information about the AMD64
AMD information for developers
AMD and industry partners' AMD64 site
František Gábriš much early 64-bit work including sample source code.
Intel 64 Technology site
Newsgroups and forums:-
64-bit assembler forum
AMD developer forum
Planet 64
Extended 64
Start64 forum
Copyright © Jeremy Gordon 2006/9
Back to top
|