View previous topic :: View next topic |
Author |
Message |
Tomaitheous Elder

Joined: 27 Sep 2005 Posts: 306 Location: Tucson
|
Posted: Sat May 20, 2006 5:40 am Post subject: operating under the T flag |
|
|
A few questions about the T flag.
Are there additional cycles attached to AND, OR, ADC, etc instructions when the T flag is set?
Does the value already existing in the A register automatically transfer into the ZP address on setting the T flag?
How do the non logic instructions react under the T flag (i.e tay, tam, etc) ?
-Rich _________________ www.pcedev.net |
|
Back to top |
|
 |
Charles MacDonald Member

Joined: 07 Dec 2005 Posts: 35
|
Posted: Sun May 21, 2006 4:24 am Post subject: Re: operating under the T flag |
|
|
Quote: | Are there additional cycles attached to AND, OR, ADC, etc instructions when the T flag is set?
|
I would imagine at the very least 2 cycles are added to read the zero page byte before the operation and write it back afterwards.
Quote: | Does the value already existing in the A register automatically transfer into the ZP address on setting the T flag?
|
Not quite. When a logic instruction is prefixed by SET, the operation that would take place on the accumulator is instead done to a zero page memory location indexed by the X register. The accumulator is not affected in any way.
Quote: |
How do the non logic instructions react under the T flag (i.e tay, tam, etc) ? |
All other instructions operate normally and are not affected by the T flag being set. Note that because of this you have to be careful of the order in which the T flag is set:
ldx #$80
clc
set
adc #$01 ; OK, adds 1 to memory address $2080
as opposed to
ldx #$80
set
clc ; T flag cleared here
adc #$01 ; adds 1 to accumulator, oops |
|
Back to top |
|
 |
Tomaitheous Elder

Joined: 27 Sep 2005 Posts: 306 Location: Tucson
|
Posted: Sun May 21, 2006 7:14 pm Post subject: |
|
|
Looking throught the instruction set I see that every instruction clears the T flag. So I guess that doesn't make as useful as I thought. I should have just looked at your doc as it has all the info I was looking for
Is the use of the T flag and the behavier of the corrosponding instruction officially documented? Or is this similar in case of the hitachi 6309 native mode not being documented until years later.
I almost forgot. Charles, in your 'pcetech' you mentioned that there is one additional clock cycle/wait state added to any instruction that accesses the VDC and VCE. Is this factored into the block transfer instructions rated at 6 clock cycles per byte, or is it actually taking 7 clock cycles per byte when pointed to the VDC/VCE?
-Rich _________________ www.pcedev.net |
|
Back to top |
|
 |
Charles MacDonald Member

Joined: 07 Dec 2005 Posts: 35
|
Posted: Sun May 21, 2006 10:28 pm Post subject: |
|
|
Quote: | Is the use of the T flag and the behavier of the corrosponding instruction officially documented? Or is this similar in case of the hitachi 6309 native mode not being documented until years later.
|
Yes, the function of the T flag (and all the non-standard features added to the HuC6280 from the base 65C02 feature set) have been documented in the Develo Book and in the PC-Engine developer manuals. Actually getting ahold of either one is another story, so for regular people like myself there is a bit of mystery surrounding such things.
Quote: | I almost forgot. Charles, in your 'pcetech' you mentioned that there is one additional clock cycle/wait state added to any instruction that accesses the VDC and VCE. Is this factored into the block transfer instructions rated at 6 clock cycles per byte, or is it actually taking 7 clock cycles per byte when pointed to the VDC/VCE? |
I can't quite figure this one out. Way back when in a timing test I found that any access to the entire $0000-$03FF range (VDC) or $0400-$07FF range (VCE) took one extra cycle, whether it be a read or write, for program execution or just regular memory access.
Recently I've been doing some work on the hardware side and found that the VDC can control the HuC6280's WAIT signal to delay processing. However it seems to only do that during VRAM reads and writes. That explains where the extra cycle comes from, but then you'd think addresses like $0000/$0001 wouldn't be affected - they too have a 1 cycle delay. Plus the VCE can't even control WAIT so the VCE delay seems impossible.
Basically I need to do more testing before I can give a definitive answer. However I think it would be highly likely there is one extra cycle for every transfer so it would be 7 clock cycles per byte. I'll check all this stuff Real Soon Now. |
|
Back to top |
|
 |
Tomaitheous Elder

Joined: 27 Sep 2005 Posts: 306 Location: Tucson
|
Posted: Tue May 23, 2006 5:13 am Post subject: |
|
|
Quote: | I'll check all this stuff Real Soon Now. |
Cool! I guess I'll try some clock tests too since I'm curious if there's any delay difference between the SGX and PCE when accessing the VDC/VCE.
Quote: | Plus the VCE can't even control WAIT so the VCE delay seems impossible.
|
Is this tested on a PCE and SGX? _________________ www.pcedev.net |
|
Back to top |
|
 |
Charles MacDonald Member

Joined: 07 Dec 2005 Posts: 35
|
Posted: Mon May 29, 2006 6:42 pm Post subject: |
|
|
Tomaitheous wrote: |
Cool! I guess I'll try some clock tests too since I'm curious if there's any delay difference between the SGX and PCE when accessing the VDC/VCE. |
I can confirm as a matter of fact that any access to $0000-$03FF (VDC) or $0400-$07FF (VCE) takes exactly 1 extra cycle each. For example LDA $0000 is 6 cycles instead of 5 on a regular CoreGrafx II (standard PCE chipset). I'll dig up the SuperGrafx and check on that later.
Note that this extra cycle penalty is for each access to a VDC/VCE address; so a RMW instruction would be +2 cycles (+1 for read and +1 for write), a block transfer instruction would be +1 for each source read and/or +1 for each destination write that access those areas.
I've got this great system for automatically timing code sequences working at the moment. Are there any other timings you wanted checked? I was going to look at CSL/CSH and the overhead from having the T flag set. |
|
Back to top |
|
 |
Tomaitheous Elder

Joined: 27 Sep 2005 Posts: 306 Location: Tucson
|
Posted: Tue May 30, 2006 12:29 am Post subject: |
|
|
Quote: | Note that this extra cycle penalty is for each access to a VDC/VCE address; so a RMW instruction would be +2 cycles (+1 for read and +1 for write), a block transfer instruction would be +1 for each source read and/or +1 for each destination write that access those areas.
|
Hmm, that has me thinking - is the +1 penalty for writing to the VDC per byte or per word? I guess that would depend on which side the latch was on. If it's on the CPU side then it should be +1 for each word? Same with the VCE?
I'll have to check again, but I remember having a test demo that could write about 5-6 colors to the VCE before the VDC goes active on a scanline - by starting the block transfer at the beginning of the h-sync interrupt(@ 5mhz with centered 256 active pixel setup). Even without the each +1 delay, it should be only able to update 2-3 colors at the most. Unless the hsync interrupt was/is being generated before the start of the next line - maybe at the end of the active display of the VDC of the previous line. Any ideas?
Also, is the CSH/CSL really changing the clock speed or just inserting/enabling wait states?
-Rich _________________ www.pcedev.net |
|
Back to top |
|
 |
Tomaitheous Elder

Joined: 27 Sep 2005 Posts: 306 Location: Tucson
|
Posted: Sun Jun 04, 2006 10:12 pm Post subject: |
|
|
Maybe there is a good but limited use for the T flag after all
Code: |
8/16bit add to 16bit var$ in zeropage
lda ZZ ; 4
clc ; 2
adc #$xx ; 2
sta ZZ ; 4
lda ZZ+1 ; 4
adc #$xx ; 2
sta ZZ+1 ; 4 total 22 cycles
And now with the T flag.
ldx #LOW(ZZ label) ; 2
clc ; 2
set ; 2
adc #$xx ; 2
inx ; 2
set ; 2
adc #$xx ; 2 total 14 cycles
incrementing an 8bit pointer for a 16bit wide array
inc ZZ ; 6
inc ZZ ; 6 total 12 cycles
ldx #LOW(ZZ label) ; 2
clc ; 2
set ; 2
adc #$02 ; 2 total 8 cycles
|
-Rich _________________ www.pcedev.net |
|
Back to top |
|
 |
dmichel Admin

Joined: 04 Apr 2002 Posts: 1166 Location: France
|
Posted: Mon Jun 05, 2006 7:19 pm Post subject: |
|
|
You forgot to add the extra cycles to 'adc' when the T flag is set.
(the T flag adds 3 cycles)
I think I never used 'set' but it's true that it could be useful at time. _________________ David Michel |
|
Back to top |
|
 |
Tomaitheous Elder

Joined: 27 Sep 2005 Posts: 306 Location: Tucson
|
Posted: Tue Jun 06, 2006 1:27 am Post subject: |
|
|
Quote: | You forgot to add the extra cycles to 'adc' when the T flag is set.
(the T flag adds 3 cycles)
|
So ADC #$xx under T flag is 5 cycles instead of 2?
From the looks of it, you save 1 cycle and 1 byte for every use of the T flag version.
Btw-
Interesting peice of info(for me atleast ) - according to WDC, for W65c02s addressing mode 'READ-MODIFY_WRITE', add 2 cycles, but later on mentions 3 cycles - probably 1 extra clock cycles for crossing a page boundry(PC fetch data), like with R-M-W absolute addressing. The W65C02S looks to be closer to the Hu6280 than the 65C02. This would make sense for _Bnu's cycle reference discrepancy in Warren Wilkinson's doc.
Hmm, it would be worth trying to copy code into ram($2000) and execute a test loop for LDA ZZ w/ a 16bit incrementer and compare it to the TIMER difference VS executing code not in the same bank as zeropage.
*UPDATE*
I wrote the test program and tested in with a SGX and PCE, there is no speed difference you accessing ZP across a page boundry.
I set the timer loop to 16384 cycles($10 @ $C00) and incremented a 16 bit counter in zeropage until the timer interrupt occured.
Code: |
MagicEngine SGX/PCE
*normal ZP code
ram 0x23D 0x244
rom 0x23F 0x244
*T flag code
ram 0x268 0x26E
rom 0x252 0x26E
|
Hmm, a quick look at the math showed the T flag on the real hardware is +2 cycles for ADC,AND, etc. _________________ www.pcedev.net |
|
Back to top |
|
 |
dmichel Admin

Joined: 04 Apr 2002 Posts: 1166 Location: France
|
Posted: Tue Jun 06, 2006 11:58 am Post subject: |
|
|
Tomaitheous wrote: |
*UPDATE*
I wrote the test program and tested in with a SGX and PCE, there is no speed difference you accessing ZP across a page boundry. |
Yup, the PCE doesn't have this one-cycle penality.
Quote: |
...
Hmm, a quick look at the math showed the T flag on the real hardware is +2 cycles for ADC,AND, etc. |
Interesting... I got the 3-cycles info from the Develo Book, may be they made a mistake. It's even more useful then.  _________________ David Michel |
|
Back to top |
|
 |
Tomaitheous Elder

Joined: 27 Sep 2005 Posts: 306 Location: Tucson
|
Posted: Wed Jun 07, 2006 5:32 am Post subject: |
|
|
Quote: | Interesting... I got the 3-cycles info from the Develo Book |
Yeah, my quick math was inaccurate It's +3 to ADC  _________________ www.pcedev.net |
|
Back to top |
|
 |
Charles MacDonald Member

Joined: 07 Dec 2005 Posts: 35
|
Posted: Sat Jun 10, 2006 6:30 am Post subject: |
|
|
Quote: | Hmm, that has me thinking - is the +1 penalty for writing to the VDC per byte or per word? I guess that would depend on which side the latch was on. If it's on the CPU side then it should be +1 for each word? Same with the VCE? |
It's not really specific to a particular VDC/VCE address, like the data port (which as you said has a latch on the LSB). The penalty happens for any access at all; addresses like $0000, $0001, $0400, $0407, $7FF, etc.
Quote: | Unless the hsync interrupt was/is being generated before the start of the next line - maybe at the end of the active display of the VDC of the previous line. Any ideas?
|
The line interrupt trigger point is relative to /HSYNC, but in the PCE the VCE provides that signal rather than the VDC. I haven't checked the timing details for this yet. For a VDC-only configuration where it generates the timing signals itself, I think the line interrupt happened fairly early in a scanline to give the most amount of time possible for register changes before the next line.
Quote: | Also, is the CSH/CSL really changing the clock speed or just inserting/enabling wait states?
|
I guess you could say the internal clock speed changes. The HuC6280 is connected to a 21 MHz clock, and the CSL/CSH instructions select if the clock signal, divided by 12 or 3 respectively, is used for the 'CPU core' part of the chip. The PSG and timer have their own independant clocks too. If you are interested, US patent 5,483,659 discusses this in detail. |
|
Back to top |
|
 |
Tomaitheous Elder

Joined: 27 Sep 2005 Posts: 306 Location: Tucson
|
Posted: Mon Jun 12, 2006 6:59 pm Post subject: |
|
|
Quote: | The line interrupt trigger point is relative to /HSYNC, but in the PCE the VCE provides that signal rather than the VDC. |
Since the VDC flags the interrupt, it must somehow know/assume where the HSYNC is even though it's not generating it, based on REGs $0A-$0E as if it were generating the HSYNC itself. _________________ www.pcedev.net |
|
Back to top |
|
 |
Charles MacDonald Member

Joined: 07 Dec 2005 Posts: 35
|
Posted: Mon Jun 12, 2006 7:44 pm Post subject: |
|
|
Tomaitheous wrote: | Quote: | The line interrupt trigger point is relative to /HSYNC, but in the PCE the VCE provides that signal rather than the VDC. |
Since the VDC flags the interrupt, it must somehow know/assume where the HSYNC is even though it's not generating it, based on REGs $0A-$0E as if it were generating the HSYNC itself. |
Yeah, I should have been more clear about that. In the PCE, the VDC is set up so it's /HSYNC and /VSYNC pins are inputs, rather than outputs. It then synchronizes itself to whatever supplies those signals, in this case it's the VCE.
This comes up in odd places, such as if a VD interrupt isn't generated during the current frame, the VDC forces one when the VCE asserts /VSYNC. Raster interrupts *may* work the same way relating to /HSYNC, though for the sake of simplicity I really hope not. |
|
Back to top |
|
 |
|