1656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.explicit
2656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.text
3656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.ident	"ia64.S, Version 2.1"
4656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.ident	"IA-64 ISA artwork by Andy Polyakov <appro@fy.chalmers.se>"
5656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
6656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
7656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// ====================================================================
8656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Written by Andy Polyakov <appro@fy.chalmers.se> for the OpenSSL
9656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// project.
10656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
11656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Rights for redistribution and usage in source and binary forms are
12656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// granted according to the OpenSSL license. Warranty of any kind is
13656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// disclaimed.
14656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// ====================================================================
15656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
16656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Version 2.x is Itanium2 re-tune. Few words about how Itanum2 is
17656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// different from Itanium to this module viewpoint. Most notably, is it
18656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// "wider" than Itanium? Can you experience loop scalability as
19656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// discussed in commentary sections? Not really:-( Itanium2 has 6
20656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// integer ALU ports, i.e. it's 2 ports wider, but it's not enough to
21656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// spin twice as fast, as I need 8 IALU ports. Amount of floating point
22656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// ports is the same, i.e. 2, while I need 4. In other words, to this
23656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// module Itanium2 remains effectively as "wide" as Itanium. Yet it's
24656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// essentially different in respect to this module, and a re-tune was
25656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// required. Well, because some intruction latencies has changed. Most
26656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// noticeably those intensively used:
27656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
28656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//			Itanium	Itanium2
29656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	ldf8		9	6		L2 hit
30656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	ld8		2	1		L1 hit
31656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	getf		2	5
32656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	xma[->getf]	7[+1]	4[+0]
33656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	add[->st8]	1[+1]	1[+0]
34656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
35656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// What does it mean? You might ratiocinate that the original code
36656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// should run just faster... Because sum of latencies is smaller...
37656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Wrong! Note that getf latency increased. This means that if a loop is
38656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// scheduled for lower latency (as they were), then it will suffer from
39656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// stall condition and the code will therefore turn anti-scalable, e.g.
40656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// original bn_mul_words spun at 5*n or 2.5 times slower than expected
41656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// on Itanium2! What to do? Reschedule loops for Itanium2? But then
42656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Itanium would exhibit anti-scalability. So I've chosen to reschedule
43656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// for worst latency for every instruction aiming for best *all-round*
44656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// performance.
45656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
46656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Q.	How much faster does it get?
47656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// A.	Here is the output from 'openssl speed rsa dsa' for vanilla
48656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	0.9.6a compiled with gcc version 2.96 20000731 (Red Hat
49656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	Linux 7.1 2.96-81):
50656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
51656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	                  sign    verify    sign/s verify/s
52656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa  512 bits   0.0036s   0.0003s    275.3   2999.2
53656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 1024 bits   0.0203s   0.0011s     49.3    894.1
54656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 2048 bits   0.1331s   0.0040s      7.5    250.9
55656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 4096 bits   0.9270s   0.0147s      1.1     68.1
56656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	                  sign    verify    sign/s verify/s
57656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	dsa  512 bits   0.0035s   0.0043s    288.3    234.8
58656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	dsa 1024 bits   0.0111s   0.0135s     90.0     74.2
59656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
60656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	And here is similar output but for this assembler
61656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	implementation:-)
62656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
63656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	                  sign    verify    sign/s verify/s
64656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa  512 bits   0.0021s   0.0001s    549.4   9638.5
65656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 1024 bits   0.0055s   0.0002s    183.8   4481.1
66656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 2048 bits   0.0244s   0.0006s     41.4   1726.3
67656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 4096 bits   0.1295s   0.0018s      7.7    561.5
68656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	                  sign    verify    sign/s verify/s
69656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	dsa  512 bits   0.0012s   0.0013s    891.9    756.6
70656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	dsa 1024 bits   0.0023s   0.0028s    440.4    376.2
71656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
72656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	Yes, you may argue that it's not fair comparison as it's
73656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	possible to craft the C implementation with BN_UMULT_HIGH
74656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	inline assembler macro. But of course! Here is the output
75656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	with the macro:
76656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
77656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	                  sign    verify    sign/s verify/s
78656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa  512 bits   0.0020s   0.0002s    495.0   6561.0
79656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 1024 bits   0.0086s   0.0004s    116.2   2235.7
80656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 2048 bits   0.0519s   0.0015s     19.3    667.3
81656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 4096 bits   0.3464s   0.0053s      2.9    187.7
82656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	                  sign    verify    sign/s verify/s
83656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	dsa  512 bits   0.0016s   0.0020s    613.1    510.5
84656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	dsa 1024 bits   0.0045s   0.0054s    221.0    183.9
85656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
86656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	My code is still way faster, huh:-) And I believe that even
87656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	higher performance can be achieved. Note that as keys get
88656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	longer, performance gain is larger. Why? According to the
89656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	profiler there is another player in the field, namely
90656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	BN_from_montgomery consuming larger and larger portion of CPU
91656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	time as keysize decreases. I therefore consider putting effort
92656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	to assembler implementation of the following routine:
93656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
94656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	void bn_mul_add_mont (BN_ULONG *rp,BN_ULONG *np,int nl,BN_ULONG n0)
95656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	{
96656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	int      i,j;
97656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	BN_ULONG v;
98656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
99656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	for (i=0; i<nl; i++)
100656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//		{
101656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//		v=bn_mul_add_words(rp,np,nl,(rp[0]*n0)&BN_MASK2);
102656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//		nrp++;
103656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//		rp++;
104656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//		if (((nrp[-1]+=v)&BN_MASK2) < v)
105656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//			for (j=0; ((++nrp[j])&BN_MASK2) == 0; j++) ;
106656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//		}
107656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	}
108656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
109656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	It might as well be beneficial to implement even combaX
110656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	variants, as it appears as it can literally unleash the
111656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	performance (see comment section to bn_mul_comba8 below).
112656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
113656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	And finally for your reference the output for 0.9.6a compiled
114656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	with SGIcc version 0.01.0-12 (keep in mind that for the moment
115656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	of this writing it's not possible to convince SGIcc to use
116656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	BN_UMULT_HIGH inline assembler macro, yet the code is fast,
117656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	i.e. for a compiler generated one:-):
118656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
119656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	                  sign    verify    sign/s verify/s
120656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa  512 bits   0.0022s   0.0002s    452.7   5894.3
121656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 1024 bits   0.0097s   0.0005s    102.7   2002.9
122656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 2048 bits   0.0578s   0.0017s     17.3    600.2
123656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	rsa 4096 bits   0.3838s   0.0061s      2.6    164.5
124656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	                  sign    verify    sign/s verify/s
125656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	dsa  512 bits   0.0018s   0.0022s    547.3    459.6
126656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	dsa 1024 bits   0.0051s   0.0062s    196.6    161.3
127656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
128656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	Oh! Benchmarks were performed on 733MHz Lion-class Itanium
129656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	system running Redhat Linux 7.1 (very special thanks to Ray
130656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	McCaffity of Williams Communications for providing an account).
131656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
132656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Q.	What's the heck with 'rum 1<<5' at the end of every function?
133656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// A.	Well, by clearing the "upper FP registers written" bit of the
134656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	User Mask I want to excuse the kernel from preserving upper
135656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	(f32-f128) FP register bank over process context switch, thus
136656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	minimizing bus bandwidth consumption during the switch (i.e.
137656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	after PKI opration completes and the program is off doing
138656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	something else like bulk symmetric encryption). Having said
139656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	this, I also want to point out that it might be good idea
140656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	to compile the whole toolkit (as well as majority of the
141656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	programs for that matter) with -mfixed-range=f32-f127 command
142656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	line option. No, it doesn't prevent the compiler from writing
143656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	to upper bank, but at least discourages to do so. If you don't
144656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	like the idea you have the option to compile the module with
145656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	-Drum=nop.m in command line.
146656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
147656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
148656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if defined(_HPUX_SOURCE) && !defined(_LP64)
149656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	ADDP	addp4
150656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#else
151656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	ADDP	add
152656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
153656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
154656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 1
155656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
156656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// bn_[add|sub]_words routines.
157656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
158656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Loops are spinning in 2*(n+5) ticks on Itanuim (provided that the
159656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// data reside in L1 cache, i.e. 2 ticks away). It's possible to
160656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// compress the epilogue and get down to 2*n+6, but at the cost of
161656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// scalability (the neat feature of this implementation is that it
162656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// shall automagically spin in n+5 on "wider" IA-64 implementations:-)
163656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// I consider that the epilogue is short enough as it is to trade tiny
164656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// performance loss on Itanium for scalability.
165656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
166656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// BN_ULONG bn_add_words(BN_ULONG *rp, BN_ULONG *ap, BN_ULONG *bp,int num)
167656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
168656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_add_words#
169656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_add_words#
170656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
171656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.skip	32	// makes the loop body aligned at 64-byte boundary
172656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_add_words:
173656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
174656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
175656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc		r2=ar.pfs,4,12,0,16
176656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp4.le		p6,p0=r35,r0	};;
177656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	mov		r8=r0			// return value
178656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	br.ret.spnt.many	b0	};;
179656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
180656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	sub		r10=r35,r0,1
181656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.lc,r3
182656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r3=ar.lc
183656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	brp.loop.imp	.L_bn_add_words_ctop,.L_bn_add_words_cend-16
184656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project					}
185656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	ADDP		r14=0,r32		// rp
186656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	pr,r9
187656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r9=pr		};;
188656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
189656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	ADDP		r15=0,r33		// ap
190656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r10
191656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.ec=6		}
192656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	ADDP		r16=0,r34		// bp
193656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr.rot=1<<16	};;
194656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
195656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_add_words_ctop:
196656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p16)	ld8		r32=[r16],8	  // b=*(bp++)
197656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p18)	add		r39=r37,r34
198656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p19)	cmp.ltu.unc	p56,p0=r40,r38	}
199656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	(p0)	nop.m		0x0
200656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.f		0x0
201656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.b		0x0		}
202656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p16)	ld8		r35=[r15],8	  // a=*(ap++)
203656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p58)	cmp.eq.or	p57,p0=-1,r41	  // (p20)
204656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p58)	add		r41=1,r41	} // (p20)
205656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	(p21)	st8		[r14]=r42,8	  // *(rp++)=r
206656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.f		0x0
207656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ctop.sptk	.L_bn_add_words_ctop	};;
208656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_add_words_cend:
209656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
210656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
211656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p59)	add		r8=1,r8		// return value
212656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr=r9,0x1ffff
213656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r3	}
214656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mbb;	nop.b		0x0
215656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ret.sptk.many	b0	};;
216656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_add_words#
217656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
218656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
219656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// BN_ULONG bn_sub_words(BN_ULONG *rp, BN_ULONG *ap, BN_ULONG *bp,int num)
220656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
221656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_sub_words#
222656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_sub_words#
223656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
224656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.skip	32	// makes the loop body aligned at 64-byte boundary
225656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_sub_words:
226656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
227656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
228656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc		r2=ar.pfs,4,12,0,16
229656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp4.le		p6,p0=r35,r0	};;
230656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	mov		r8=r0			// return value
231656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	br.ret.spnt.many	b0	};;
232656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
233656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	sub		r10=r35,r0,1
234656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.lc,r3
235656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r3=ar.lc
236656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	brp.loop.imp	.L_bn_sub_words_ctop,.L_bn_sub_words_cend-16
237656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project					}
238656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	ADDP		r14=0,r32		// rp
239656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	pr,r9
240656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r9=pr		};;
241656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
242656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	ADDP		r15=0,r33		// ap
243656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r10
244656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.ec=6		}
245656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	ADDP		r16=0,r34		// bp
246656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr.rot=1<<16	};;
247656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
248656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_sub_words_ctop:
249656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p16)	ld8		r32=[r16],8	  // b=*(bp++)
250656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p18)	sub		r39=r37,r34
251656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p19)	cmp.gtu.unc	p56,p0=r40,r38	}
252656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	(p0)	nop.m		0x0
253656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.f		0x0
254656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.b		0x0		}
255656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p16)	ld8		r35=[r15],8	  // a=*(ap++)
256656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p58)	cmp.eq.or	p57,p0=0,r41	  // (p20)
257656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p58)	add		r41=-1,r41	} // (p20)
258656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mbb;	(p21)	st8		[r14]=r42,8	  // *(rp++)=r
259656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.b		0x0
260656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ctop.sptk	.L_bn_sub_words_ctop	};;
261656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_sub_words_cend:
262656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
263656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
264656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p59)	add		r8=1,r8		// return value
265656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr=r9,0x1ffff
266656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r3	}
267656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mbb;	nop.b		0x0
268656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ret.sptk.many	b0	};;
269656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_sub_words#
270656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
271656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
272656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 0
273656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define XMA_TEMPTATION
274656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
275656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
276656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 1
277656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
278656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// BN_ULONG bn_mul_words(BN_ULONG *rp, BN_ULONG *ap, int num, BN_ULONG w)
279656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
280656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_mul_words#
281656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_mul_words#
282656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
283656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.skip	32	// makes the loop body aligned at 64-byte boundary
284656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_mul_words:
285656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
286656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
287656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#ifdef XMA_TEMPTATION
288656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	alloc		r2=ar.pfs,4,0,0,0	};;
289656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#else
290656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	alloc		r2=ar.pfs,4,12,0,16	};;
291656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
292656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	mov		r8=r0			// return value
293656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp4.le		p6,p0=r34,r0
294656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	br.ret.spnt.many	b0		};;
295656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
296656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	sub	r10=r34,r0,1
297656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.lc,r3
298656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov	r3=ar.lc
299656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	pr,r9
300656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov	r9=pr			};;
301656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
302656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
303656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	setf.sig	f8=r35	// w
304656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr.rot=0x800001<<16
305656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project			// ------^----- serves as (p50) at first (p27)
306656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	brp.loop.imp	.L_bn_mul_words_ctop,.L_bn_mul_words_cend-16
307656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project					}
308656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
309656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#ifndef XMA_TEMPTATION
310656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
311656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ADDP		r14=0,r32	// rp
312656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ADDP		r15=0,r33	// ap
313656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r10	}
314656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	mov		r40=0		// serves as r35 at first (p27)
315656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.ec=13	};;
316656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
317656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// This loop spins in 2*(n+12) ticks. It's scheduled for data in Itanium
318656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// L2 cache (i.e. 9 ticks away) as floating point load/store instructions
319656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// bypass L1 cache and L2 latency is actually best-case scenario for
320656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// ldf8. The loop is not scalable and shall run in 2*(n+12) even on
321656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// "wider" IA-64 implementations. It's a trade-off here. n+24 loop
322656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// would give us ~5% in *overall* performance improvement on "wider"
323656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// IA-64, but would hurt Itanium for about same because of longer
324656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// epilogue. As it's a matter of few percents in either case I've
325656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// chosen to trade the scalability for development time (you can see
326656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// this very instruction sequence in bn_mul_add_words loop which in
327656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// turn is scalable).
328656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_mul_words_ctop:
329656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	(p25)	getf.sig	r36=f52			// low
330656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p21)	xmpy.lu		f48=f37,f8
331656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p28)	cmp.ltu		p54,p50=r41,r39	}
332656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	(p16)	ldf8		f32=[r15],8
333656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p21)	xmpy.hu		f40=f37,f8
334656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.i		0x0		};;
335656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p25)	getf.sig	r32=f44			// high
336656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.pred.rel	"mutex",p50,p54
337656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p50)	add		r40=r38,r35		// (p27)
338656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p54)	add		r40=r38,r35,1	}	// (p27)
339656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	(p28)	st8		[r14]=r41,8
340656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.f		0x0
341656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ctop.sptk	.L_bn_mul_words_ctop	};;
342656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_mul_words_cend:
343656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
344656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	nop.m		0x0
345656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.pred.rel	"mutex",p51,p55
346656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p51)	add		r8=r36,r0
347656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p55)	add		r8=r36,r0,1	}
348656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	nop.m	0x0
349656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	nop.f	0x0
350656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	nop.b	0x0			}
351656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
352656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#else	// XMA_TEMPTATION
353656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
354656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	setf.sig	f37=r0	// serves as carry at (p18) tick
355656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r10
356656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.ec=5;;
357656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
358656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Most of you examining this code very likely wonder why in the name
359656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// of Intel the following loop is commented out? Indeed, it looks so
360656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// neat that you find it hard to believe that it's something wrong
361656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// with it, right? The catch is that every iteration depends on the
362656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// result from previous one and the latter isn't available instantly.
363656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// The loop therefore spins at the latency of xma minus 1, or in other
364656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// words at 6*(n+4) ticks:-( Compare to the "production" loop above
365656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// that runs in 2*(n+11) where the low latency problem is worked around
366656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// by moving the dependency to one-tick latent interger ALU. Note that
367656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// "distance" between ldf8 and xma is not latency of ldf8, but the
368656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// *difference* between xma and ldf8 latencies.
369656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_mul_words_ctop:
370656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	(p16)	ldf8		f32=[r33],8
371656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p18)	xma.hu		f38=f34,f8,f39	}
372656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	(p20)	stf8		[r32]=f37,8
373656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p18)	xma.lu		f35=f34,f8,f39
374656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ctop.sptk	.L_bn_mul_words_ctop	};;
375656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_mul_words_cend:
376656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
377656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	getf.sig	r8=f41		// the return value
378656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
379656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif	// XMA_TEMPTATION
380656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
381656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	nop.m		0x0
382656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr=r9,0x1ffff
383656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r3	}
384656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	rum		1<<5		// clear um.mfh
385656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	nop.f		0x0
386656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ret.sptk.many	b0	};;
387656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_mul_words#
388656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
389656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
390656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 1
391656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
392656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// BN_ULONG bn_mul_add_words(BN_ULONG *rp, BN_ULONG *ap, int num, BN_ULONG w)
393656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
394656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_mul_add_words#
395656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_mul_add_words#
396656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
397656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.skip	48	// makes the loop body aligned at 64-byte boundary
398656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_mul_add_words:
399656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
400656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
401656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	alloc		r2=ar.pfs,4,4,0,8
402656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp4.le		p6,p0=r34,r0
403656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.lc,r3
404656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r3=ar.lc	};;
405656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	mov		r8=r0		// return value
406656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	sub		r10=r34,r0,1
407656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	br.ret.spnt.many	b0	};;
408656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
409656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	setf.sig	f8=r35		// w
410656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	pr,r9
411656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r9=pr
412656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	brp.loop.imp	.L_bn_mul_add_words_ctop,.L_bn_mul_add_words_cend-16
413656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project					}
414656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
415656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ADDP		r14=0,r32	// rp
416656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ADDP		r15=0,r33	// ap
417656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r10	}
418656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	ADDP		r16=0,r32	// rp copy
419656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr.rot=0x2001<<16
420656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project			// ------^----- serves as (p40) at first (p27)
421656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.ec=11	};;
422656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
423656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// This loop spins in 3*(n+10) ticks on Itanium and in 2*(n+10) on
424656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Itanium 2. Yes, unlike previous versions it scales:-) Previous
425656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// version was peforming *all* additions in IALU and was starving
426656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// for those even on Itanium 2. In this version one addition is
427656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// moved to FPU and is folded with multiplication. This is at cost
428656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// of propogating the result from previous call to this subroutine
429656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// to L2 cache... In other words negligible even for shorter keys.
430656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// *Overall* performance improvement [over previous version] varies
431656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// from 11 to 22 percent depending on key length.
432656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_mul_add_words_ctop:
433656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.pred.rel	"mutex",p40,p42
434656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	(p23)	getf.sig	r36=f45			// low
435656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p20)	xma.lu		f42=f36,f8,f50		// low
436656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p40)	add		r39=r39,r35	}	// (p27)
437656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	(p16)	ldf8		f32=[r15],8		// *(ap++)
438656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p20)	xma.hu		f36=f36,f8,f50		// high
439656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p42)	add		r39=r39,r35,1	};;	// (p27)
440656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	(p24)	getf.sig	r32=f40			// high
441656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p16)	ldf8		f46=[r16],8		// *(rp1++)
442656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p40)	cmp.ltu		p41,p39=r39,r35	}	// (p27)
443656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	(p26)	st8		[r14]=r39,8		// *(rp2++)
444656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p42)	cmp.leu		p41,p39=r39,r35		// (p27)
445656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ctop.sptk	.L_bn_mul_add_words_ctop};;
446656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_mul_add_words_cend:
447656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
448656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	.pred.rel	"mutex",p40,p42
449656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p40)	add		r8=r35,r0
450656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p42)	add		r8=r35,r0,1
451656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr=r9,0x1ffff	}
452656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	rum		1<<5		// clear um.mfh
453656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r3
454656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ret.sptk.many	b0	};;
455656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_mul_add_words#
456656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
457656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
458656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 1
459656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
460656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// void bn_sqr_words(BN_ULONG *rp, BN_ULONG *ap, int num)
461656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
462656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_sqr_words#
463656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_sqr_words#
464656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
465656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.skip	32	// makes the loop body aligned at 64-byte boundary
466656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_sqr_words:
467656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
468656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
469656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc		r2=ar.pfs,3,0,0,0
470656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	sxt4		r34=r34		};;
471656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	cmp.le		p6,p0=r34,r0
472656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r8=r0		}	// return value
473656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	ADDP		r32=0,r32
474656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	nop.f		0x0
475656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	br.ret.spnt.many	b0	};;
476656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
477656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	sub	r10=r34,r0,1
478656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.lc,r3
479656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov	r3=ar.lc
480656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	pr,r9
481656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov	r9=pr			};;
482656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
483656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
484656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	ADDP		r33=0,r33
485656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr.rot=1<<16
486656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	brp.loop.imp	.L_bn_sqr_words_ctop,.L_bn_sqr_words_cend-16
487656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project					}
488656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	add		r34=8,r32
489656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r10
490656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.ec=18	};;
491656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
492656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// 2*(n+17) on Itanium, (n+17) on "wider" IA-64 implementations. It's
493656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// possible to compress the epilogue (I'm getting tired to write this
494656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// comment over and over) and get down to 2*n+16 at the cost of
495656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// scalability. The decision will very likely be reconsidered after the
496656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// benchmark program is profiled. I.e. if perfomance gain on Itanium
497656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// will appear larger than loss on "wider" IA-64, then the loop should
498656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// be explicitely split and the epilogue compressed.
499656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_sqr_words_ctop:
500656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	(p16)	ldf8		f32=[r33],8
501656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p25)	xmpy.lu		f42=f41,f41
502656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.i		0x0		}
503656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	(p33)	stf8		[r32]=f50,16
504656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.i		0x0
505656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.b		0x0		}
506656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	(p0)	nop.m		0x0
507656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p25)	xmpy.hu		f52=f41,f41
508656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.i		0x0		}
509656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	(p33)	stf8		[r34]=f60,16
510656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.i		0x0
511656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ctop.sptk	.L_bn_sqr_words_ctop	};;
512656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_bn_sqr_words_cend:
513656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
514656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	nop.m		0x0
515656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr=r9,0x1ffff
516656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.lc=r3	}
517656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	rum		1<<5		// clear um.mfh
518656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	nop.f		0x0
519656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ret.sptk.many	b0	};;
520656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_sqr_words#
521656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
522656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
523656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 1
524656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Apparently we win nothing by implementing special bn_sqr_comba8.
525656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Yes, it is possible to reduce the number of multiplications by
526656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// almost factor of two, but then the amount of additions would
527656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// increase by factor of two (as we would have to perform those
528656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// otherwise performed by xma ourselves). Normally we would trade
529656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// anyway as multiplications are way more expensive, but not this
530656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// time... Multiplication kernel is fully pipelined and as we drain
531656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// one 128-bit multiplication result per clock cycle multiplications
532656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// are effectively as inexpensive as additions. Special implementation
533656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// might become of interest for "wider" IA-64 implementation as you'll
534656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// be able to get through the multiplication phase faster (there won't
535656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// be any stall issues as discussed in the commentary section below and
536656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// you therefore will be able to employ all 4 FP units)... But these
537656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Itanium days it's simply too hard to justify the effort so I just
538656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// drop down to bn_mul_comba8 code:-)
539656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
540656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// void bn_sqr_comba8(BN_ULONG *r, BN_ULONG *a)
541656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
542656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_sqr_comba8#
543656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_sqr_comba8#
544656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
545656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_sqr_comba8:
546656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
547656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
548656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if defined(_HPUX_SOURCE) && !defined(_LP64)
549656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc	r2=ar.pfs,2,1,0,0
550656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	addp4	r33=0,r33
551656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	addp4	r32=0,r32		};;
552656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
553656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#else
554656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc	r2=ar.pfs,2,1,0,0
555656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
556656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov	r34=r33
557656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r14=8,r33		};;
558656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
559656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	add	r17=8,r34
560656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r15=16,r33
561656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r18=16,r34		}
562656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	add	r16=24,r33
563656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br	.L_cheat_entry_point8	};;
564656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_sqr_comba8#
565656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
566656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
567656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 1
568656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// I've estimated this routine to run in ~120 ticks, but in reality
569656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// (i.e. according to ar.itc) it takes ~160 ticks. Are those extra
570656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// cycles consumed for instructions fetch? Or did I misinterpret some
571656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// clause in Itanium �-architecture manual? Comments are welcomed and
572656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// highly appreciated.
573656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
574656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// On Itanium 2 it takes ~190 ticks. This is because of stalls on
575656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// result from getf.sig. I do nothing about it at this point for
576656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// reasons depicted below.
577656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
578656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// However! It should be noted that even 160 ticks is darn good result
579656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// as it's over 10 (yes, ten, spelled as t-e-n) times faster than the
580656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// C version (compiled with gcc with inline assembler). I really
581656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// kicked compiler's butt here, didn't I? Yeah! This brings us to the
582656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// following statement. It's damn shame that this routine isn't called
583656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// very often nowadays! According to the profiler most CPU time is
584656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// consumed by bn_mul_add_words called from BN_from_montgomery. In
585656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// order to estimate what we're missing, I've compared the performance
586656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// of this routine against "traditional" implementation, i.e. against
587656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// following routine:
588656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
589656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// void bn_mul_comba8(BN_ULONG *r, BN_ULONG *a, BN_ULONG *b)
590656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// {	r[ 8]=bn_mul_words(    &(r[0]),a,8,b[0]);
591656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	r[ 9]=bn_mul_add_words(&(r[1]),a,8,b[1]);
592656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	r[10]=bn_mul_add_words(&(r[2]),a,8,b[2]);
593656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	r[11]=bn_mul_add_words(&(r[3]),a,8,b[3]);
594656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	r[12]=bn_mul_add_words(&(r[4]),a,8,b[4]);
595656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	r[13]=bn_mul_add_words(&(r[5]),a,8,b[5]);
596656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	r[14]=bn_mul_add_words(&(r[6]),a,8,b[6]);
597656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	r[15]=bn_mul_add_words(&(r[7]),a,8,b[7]);
598656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// }
599656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
600656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// The one below is over 8 times faster than the one above:-( Even
601656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// more reasons to "combafy" bn_mul_add_mont...
602656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
603656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// And yes, this routine really made me wish there were an optimizing
604656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// assembler! It also feels like it deserves a dedication.
605656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
606656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	To my wife for being there and to my kids...
607656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
608656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// void bn_mul_comba8(BN_ULONG *r, BN_ULONG *a, BN_ULONG *b)
609656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
610656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	carry1	r14
611656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	carry2	r15
612656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	carry3	r34
613656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_mul_comba8#
614656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_mul_comba8#
615656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
616656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_mul_comba8:
617656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
618656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
619656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if defined(_HPUX_SOURCE) && !defined(_LP64)
620656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc	r2=ar.pfs,3,0,0,0
621656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	addp4	r33=0,r33
622656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	addp4	r34=0,r34		};;
623656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	addp4	r32=0,r32
624656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#else
625656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc   r2=ar.pfs,3,0,0,0
626656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
627656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r14=8,r33
628656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r17=8,r34		}
629656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
630656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	add	r15=16,r33
631656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r18=16,r34
632656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r16=24,r33		}
633656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_cheat_entry_point8:
634656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	add	r19=24,r34
635656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
636656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f32=[r33],32		};;
637656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
638656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f120=[r34],32
639656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f121=[r17],32		}
640656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f122=[r18],32
641656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f123=[r19],32		};;
642656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f124=[r34]
643656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f125=[r17]		}
644656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f126=[r18]
645656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f127=[r19]		}
646656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
647656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f33=[r14],32
648656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f34=[r15],32		}
649656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f35=[r16],32;;
650656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f36=[r33]		}
651656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f37=[r14]
652656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f38=[r15]		}
653656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	ldf8	f39=[r16]
654656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// -------\ Entering multiplier's heaven /-------
655656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// ------------\                    /------------
656656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// -----------------\          /-----------------
657656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// ----------------------\/----------------------
658656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f41=f32,f120,f0		}
659656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f40=f32,f120,f0		};; // (*)
660656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f51=f32,f121,f0		}
661656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f50=f32,f121,f0		};;
662656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f61=f32,f122,f0		}
663656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f60=f32,f122,f0		};;
664656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f71=f32,f123,f0		}
665656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f70=f32,f123,f0		};;
666656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f81=f32,f124,f0		}
667656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f80=f32,f124,f0		};;
668656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f91=f32,f125,f0		}
669656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f90=f32,f125,f0		};;
670656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f101=f32,f126,f0	}
671656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f100=f32,f126,f0	};;
672656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f111=f32,f127,f0	}
673656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f110=f32,f127,f0	};;//
674656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// (*)	You can argue that splitting at every second bundle would
675656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	prevent "wider" IA-64 implementations from achieving the peak
676656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	performance. Well, not really... The catch is that if you
677656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	intend to keep 4 FP units busy by splitting at every fourth
678656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	bundle and thus perform these 16 multiplications in 4 ticks,
679656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	the first bundle *below* would stall because the result from
680656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	the first xma bundle *above* won't be available for another 3
681656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	ticks (if not more, being an optimist, I assume that "wider"
682656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	implementation will have same latency:-). This stall will hold
683656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	you back and the performance would be as if every second bundle
684656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//	were split *anyway*...
685656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r16=f40
686656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f42=f33,f120,f41
687656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r33=8,r32		}
688656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f41=f33,f120,f41	};;
689656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r24=f50
690656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f52=f33,f121,f51	}
691656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f51=f33,f121,f51	};;
692656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r32]=r16,16
693656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f62=f33,f122,f61	}
694656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f61=f33,f122,f61	};;
695656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f72=f33,f123,f71	}
696656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f71=f33,f123,f71	};;
697656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f82=f33,f124,f81	}
698656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f81=f33,f124,f81	};;
699656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f92=f33,f125,f91	}
700656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f91=f33,f125,f91	};;
701656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f102=f33,f126,f101	}
702656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f101=f33,f126,f101	};;
703656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f112=f33,f127,f111	}
704656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f111=f33,f127,f111	};;//
705656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
706656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r25=f41
707656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f43=f34,f120,f42	}
708656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f42=f34,f120,f42	};;
709656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r16=f60
710656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f53=f34,f121,f52	}
711656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f52=f34,f121,f52	};;
712656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r17=f51
713656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f63=f34,f122,f62
714656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r25=r25,r24		}
715656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f62=f34,f122,f62
716656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		carry1=0		};;
717656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p6,p0=r25,r24
718656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f73=f34,f123,f72	}
719656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f72=f34,f123,f72	};;
720656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r33]=r25,16
721656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f83=f34,f124,f82
722656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
723656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f82=f34,f124,f82	};;
724656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f93=f34,f125,f92	}
725656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f92=f34,f125,f92	};;
726656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f103=f34,f126,f102	}
727656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f102=f34,f126,f102	};;
728656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f113=f34,f127,f112	}
729656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f112=f34,f127,f112	};;//
730656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
731656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r18=f42
732656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f44=f35,f120,f43
733656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r17=r17,r16		}
734656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f43=f35,f120,f43	};;
735656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r24=f70
736656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f54=f35,f121,f53	}
737656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	mov		carry2=0
738656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f53=f35,f121,f53	};;
739656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r25=f61
740656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f64=f35,f122,f63
741656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p7,p0=r17,r16		}
742656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	add		r18=r18,r17
743656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f63=f35,f122,f63	};;
744656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r26=f52
745656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f74=f35,f123,f73
746656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
747656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r18,r17
748656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f73=f35,f123,f73
749656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r18=r18,carry1		};;
750656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;
751656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f84=f35,f124,f83
752656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
753656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r18,carry1
754656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f83=f35,f124,f83	};;
755656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r32]=r18,16
756656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f94=f35,f125,f93
757656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
758656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f93=f35,f125,f93	};;
759656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f104=f35,f126,f103	}
760656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f103=f35,f126,f103	};;
761656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f114=f35,f127,f113	}
762656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	mov		carry1=0
763656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f113=f35,f127,f113
764656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r25=r25,r24		};;//
765656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
766656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r27=f43
767656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f45=f36,f120,f44
768656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r25,r24		}
769656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f44=f36,f120,f44
770656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,r25		};;
771656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r16=f80
772656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f55=f36,f121,f54
773656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
774656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f54=f36,f121,f54	};;
775656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r17=f71
776656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f65=f36,f122,f64
777656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r26,r25		}
778656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f64=f36,f122,f64
779656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r27=r27,r26		};;
780656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r18=f62
781656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f75=f36,f123,f74
782656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
783656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p6,p0=r27,r26
784656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f74=f36,f123,f74
785656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r27=r27,carry2		};;
786656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r19=f53
787656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f85=f36,f124,f84
788656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
789656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f84=f36,f124,f84
790656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r27,carry2	};;
791656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r33]=r27,16
792656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f95=f36,f125,f94
793656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
794656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f94=f36,f125,f94	};;
795656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f105=f36,f126,f104	}
796656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	mov		carry2=0
797656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f104=f36,f126,f104
798656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r17=r17,r16		};;
799656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f115=f36,f127,f114
800656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p7,p0=r17,r16		}
801656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f114=f36,f127,f114
802656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r18=r18,r17		};;//
803656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
804656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r20=f44
805656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f46=f37,f120,f45
806656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
807656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r18,r17
808656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f45=f37,f120,f45
809656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r19=r19,r18		};;
810656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r24=f90
811656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f56=f37,f121,f55	}
812656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f55=f37,f121,f55	};;
813656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r25=f81
814656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f66=f37,f122,f65
815656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
816656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r19,r18
817656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f65=f37,f122,f65
818656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r20=r20,r19		};;
819656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r26=f72
820656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f76=f37,f123,f75
821656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
822656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r20,r19
823656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f75=f37,f123,f75
824656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r20=r20,carry1		};;
825656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r27=f63
826656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f86=f37,f124,f85
827656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
828656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f85=f37,f124,f85
829656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p7,p0=r20,carry1	};;
830656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r28=f54
831656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f96=f37,f125,f95
832656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
833656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r32]=r20,16
834656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f95=f37,f125,f95	};;
835656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f106=f37,f126,f105	}
836656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	mov		carry1=0
837656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f105=f37,f126,f105
838656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r25=r25,r24		};;
839656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f116=f37,f127,f115
840656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r25,r24		}
841656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f115=f37,f127,f115
842656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,r25		};;//
843656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
844656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r29=f45
845656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f47=f38,f120,f46
846656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
847656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p6,p0=r26,r25
848656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f46=f38,f120,f46
849656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r27=r27,r26		};;
850656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r16=f100
851656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f57=f38,f121,f56
852656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
853656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p6,p0=r27,r26
854656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f56=f38,f121,f56
855656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r28=r28,r27		};;
856656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r17=f91
857656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f67=f38,f122,f66
858656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
859656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p6,p0=r28,r27
860656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f66=f38,f122,f66
861656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r29=r29,r28		};;
862656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r18=f82
863656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f77=f38,f123,f76
864656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
865656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p6,p0=r29,r28
866656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f76=f38,f123,f76
867656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r29=r29,carry2		};;
868656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r19=f73
869656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f87=f38,f124,f86
870656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
871656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f86=f38,f124,f86
872656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r29,carry2	};;
873656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r20=f64
874656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f97=f38,f125,f96
875656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
876656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r33]=r29,16
877656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f96=f38,f125,f96	};;
878656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r21=f55
879656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f107=f38,f126,f106	}
880656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	mov		carry2=0
881656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f106=f38,f126,f106
882656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r17=r17,r16		};;
883656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f117=f38,f127,f116
884656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p7,p0=r17,r16		}
885656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f116=f38,f127,f116
886656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r18=r18,r17		};;//
887656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
888656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r22=f46
889656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f48=f39,f120,f47
890656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
891656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r18,r17
892656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f47=f39,f120,f47
893656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r19=r19,r18		};;
894656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r24=f110
895656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f58=f39,f121,f57
896656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
897656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r19,r18
898656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f57=f39,f121,f57
899656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r20=r20,r19		};;
900656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r25=f101
901656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f68=f39,f122,f67
902656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
903656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r20,r19
904656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f67=f39,f122,f67
905656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r21=r21,r20		};;
906656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r26=f92
907656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f78=f39,f123,f77
908656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
909656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r21,r20
910656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f77=f39,f123,f77
911656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r22=r22,r21		};;
912656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r27=f83
913656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f88=f39,f124,f87
914656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
915656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	cmp.ltu		p7,p0=r22,r21
916656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f87=f39,f124,f87
917656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r22=r22,carry1		};;
918656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r28=f74
919656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f98=f39,f125,f97
920656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
921656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f97=f39,f125,f97
922656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p7,p0=r22,carry1	};;
923656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r29=f65
924656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f108=f39,f126,f107
925656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		}
926656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r32]=r22,16
927656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f107=f39,f126,f107	};;
928656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r30=f56
929656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f118=f39,f127,f117	}
930656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f117=f39,f127,f117	};;//
931656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
932656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Leaving muliplier's heaven... Quite a ride, huh?
933656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
934656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	getf.sig	r31=f47
935656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r25=r25,r24
936656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		carry1=0		};;
937656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r16=f111
938656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r25,r24
939656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,r25		};;
940656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r17=f102	}
941656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
942656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
943656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r26,r25
944656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r27=r27,r26		};;
945656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	nop.m	0x0				}
946656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
947656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
948656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r27,r26
949656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r28=r28,r27		};;
950656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r18=f93
951656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r17=r17,r16
952656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		mov		carry3=0	}
953656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
954656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
955656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r28,r27
956656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r29=r29,r28		};;
957656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r19=f84
958656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r17,r16	}
959656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
960656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
961656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r29,r28
962656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r30=r30,r29		};;
963656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r20=f75
964656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r18=r18,r17	}
965656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
966656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
967656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r30,r29
968656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r31=r31,r30		};;
969656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r21=f66		}
970656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
971656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r18,r17
972656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r19=r19,r18	}
973656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	nop.m	0x0				}
974656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
975656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
976656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r31,r30
977656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r31=r31,carry2		};;
978656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r22=f57		}
979656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
980656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r19,r18
981656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r20=r20,r19	}
982656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	nop.m	0x0				}
983656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
984656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
985656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r31,carry2	};;
986656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r23=f48		}
987656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
988656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r20,r19
989656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r21=r21,r20	}
990656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
991656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
992656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	st8		[r33]=r31,16		};;
993656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
994656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r24=f112		}
995656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
996656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r21,r20
997656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r22=r22,r21	};;
998656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r25=f103		}
999656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
1000656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r22,r21
1001656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r23=r23,r22	};;
1002656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r26=f94			}
1003656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
1004656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r23,r22
1005656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r23=r23,carry1	};;
1006656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r27=f85			}
1007656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
1008656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p8=r23,carry1};;
1009656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	getf.sig	r28=f76
1010656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r25=r25,r24
1011656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		carry1=0		}
1012656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		st8		[r32]=r23,16
1013656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p7)	add		carry2=1,carry3
1014656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p8)	add		carry2=0,carry3	};;
1015656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1016656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	nop.m	0x0				}
1017656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	getf.sig	r29=f67
1018656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r25,r24
1019656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,r25		};;
1020656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r30=f58			}
1021656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1022656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1023656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r26,r25
1024656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r27=r27,r26		};;
1025656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r16=f113	}
1026656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1027656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1028656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r27,r26
1029656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r28=r28,r27		};;
1030656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r17=f104	}
1031656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1032656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1033656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r28,r27
1034656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r29=r29,r28		};;
1035656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r18=f95		}
1036656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1037656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1038656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r29,r28
1039656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r30=r30,r29		};;
1040656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r19=f86
1041656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r17=r17,r16
1042656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		mov		carry3=0	}
1043656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1044656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1045656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r30,r29
1046656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r30=r30,carry2		};;
1047656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r20=f77
1048656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r17,r16
1049656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r18=r18,r17	}
1050656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1051656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1052656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r30,carry2	};;
1053656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r21=f68		}
1054656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	st8		[r33]=r30,16
1055656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		};;
1056656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1057656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r24=f114		}
1058656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
1059656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r18,r17
1060656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r19=r19,r18	};;
1061656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r25=f105		}
1062656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
1063656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r19,r18
1064656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r20=r20,r19	};;
1065656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r26=f96			}
1066656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
1067656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r20,r19
1068656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r21=r21,r20	};;
1069656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r27=f87			}
1070656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry3=1,carry3
1071656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r21,r20
1072656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r21=r21,carry1	};;
1073656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	getf.sig	r28=f78
1074656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r25=r25,r24		}
1075656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	(p7)	add		carry3=1,carry3
1076656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p8=r21,carry1};;
1077656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		st8		[r32]=r21,16
1078656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p7)	add		carry2=1,carry3
1079656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p8)	add		carry2=0,carry3	}
1080656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1081656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	mov		carry1=0
1082656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r25,r24
1083656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,r25		};;
1084656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r16=f115	}
1085656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1086656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1087656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r26,r25
1088656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r27=r27,r26		};;
1089656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r17=f106	}
1090656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1091656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1092656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r27,r26
1093656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r28=r28,r27		};;
1094656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r18=f97		}
1095656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1096656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1097656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r28,r27
1098656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r28=r28,carry2		};;
1099656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;		getf.sig	r19=f88
1100656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r17=r17,r16	}
1101656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;
1102656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1103656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r28,carry2	};;
1104656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	st8		[r33]=r28,16
1105656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
1106656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1107656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		mov		carry2=0
1108656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r17,r16
1109656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r18=r18,r17	};;
1110656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r24=f116		}
1111656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry2=1,carry2
1112656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r18,r17
1113656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r19=r19,r18	};;
1114656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r25=f107		}
1115656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry2=1,carry2
1116656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r19,r18
1117656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r19=r19,carry1	};;
1118656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r26=f98			}
1119656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry2=1,carry2
1120656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r19,carry1};;
1121656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		st8		[r32]=r19,16
1122656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p7)	add		carry2=1,carry2	}
1123656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1124656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	add		r25=r25,r24		};;
1125656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1126656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r16=f117	}
1127656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	mov		carry1=0
1128656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r25,r24
1129656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,r25		};;
1130656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		getf.sig	r17=f108	}
1131656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1132656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1133656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r26,r25
1134656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,carry2		};;
1135656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	nop.m	0x0				}
1136656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1137656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1138656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r26,carry2	};;
1139656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	st8		[r33]=r26,16
1140656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
1141656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1142656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;		add		r17=r17,r16	};;
1143656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	getf.sig	r24=f118		}
1144656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		mov		carry2=0
1145656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r17,r16
1146656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r17=r17,carry1	};;
1147656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry2=1,carry2
1148656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r17,carry1};;
1149656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		st8		[r32]=r17
1150656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p7)	add		carry2=1,carry2	};;
1151656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	add		r24=r24,carry2		};;
1152656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	st8		[r33]=r24		}
1153656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1154656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	rum		1<<5		// clear um.mfh
1155656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ret.sptk.many	b0	};;
1156656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_mul_comba8#
1157656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#undef	carry3
1158656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#undef	carry2
1159656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#undef	carry1
1160656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
1161656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1162656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 1
1163656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// It's possible to make it faster (see comment to bn_sqr_comba8), but
1164656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// I reckon it doesn't worth the effort. Basically because the routine
1165656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// (actually both of them) practically never called... So I just play
1166656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// same trick as with bn_sqr_comba8.
1167656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
1168656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// void bn_sqr_comba4(BN_ULONG *r, BN_ULONG *a)
1169656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
1170656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_sqr_comba4#
1171656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_sqr_comba4#
1172656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
1173656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_sqr_comba4:
1174656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
1175656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
1176656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if defined(_HPUX_SOURCE) && !defined(_LP64)
1177656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc   r2=ar.pfs,2,1,0,0
1178656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	addp4	r32=0,r32
1179656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	addp4	r33=0,r33		};;
1180656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1181656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#else
1182656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc	r2=ar.pfs,2,1,0,0
1183656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
1184656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov	r34=r33
1185656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r14=8,r33		};;
1186656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
1187656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	add	r17=8,r34
1188656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r15=16,r33
1189656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r18=16,r34		}
1190656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	add	r16=24,r33
1191656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br	.L_cheat_entry_point4	};;
1192656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_sqr_comba4#
1193656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
1194656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1195656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 1
1196656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Runs in ~115 cycles and ~4.5 times faster than C. Well, whatever...
1197656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
1198656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// void bn_mul_comba4(BN_ULONG *r, BN_ULONG *a, BN_ULONG *b)
1199656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
1200656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	carry1	r14
1201656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	carry2	r15
1202656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_mul_comba4#
1203656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_mul_comba4#
1204656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
1205656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_mul_comba4:
1206656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
1207656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
1208656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if defined(_HPUX_SOURCE) && !defined(_LP64)
1209656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc   r2=ar.pfs,3,0,0,0
1210656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	addp4	r33=0,r33
1211656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	addp4	r34=0,r34		};;
1212656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	addp4	r32=0,r32
1213656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#else
1214656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc	r2=ar.pfs,3,0,0,0
1215656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
1216656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r14=8,r33
1217656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r17=8,r34		}
1218656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
1219656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	add	r15=16,r33
1220656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r18=16,r34
1221656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add	r16=24,r33		};;
1222656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_cheat_entry_point4:
1223656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	add	r19=24,r34
1224656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1225656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f32=[r33]		}
1226656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1227656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f120=[r34]
1228656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f121=[r17]		};;
1229656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f122=[r18]
1230656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f123=[r19]		}
1231656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1232656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	ldf8	f33=[r14]
1233656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	ldf8	f34=[r15]		}
1234656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	ldf8	f35=[r16]
1235656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1236656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f41=f32,f120,f0		}
1237656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f40=f32,f120,f0		};;
1238656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f51=f32,f121,f0		}
1239656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f50=f32,f121,f0		};;
1240656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f61=f32,f122,f0		}
1241656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f60=f32,f122,f0		};;
1242656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f71=f32,f123,f0		}
1243656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f70=f32,f123,f0		};;//
1244656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Major stall takes place here, and 3 more places below. Result from
1245656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// first xma is not available for another 3 ticks.
1246656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r16=f40
1247656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f42=f33,f120,f41
1248656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r33=8,r32		}
1249656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f41=f33,f120,f41	};;
1250656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r24=f50
1251656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f52=f33,f121,f51	}
1252656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f51=f33,f121,f51	};;
1253656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r32]=r16,16
1254656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f62=f33,f122,f61	}
1255656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f61=f33,f122,f61	};;
1256656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.hu	f72=f33,f123,f71	}
1257656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f71=f33,f123,f71	};;//
1258656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
1259656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r25=f41
1260656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f43=f34,f120,f42	}
1261656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f42=f34,f120,f42	};;
1262656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r16=f60
1263656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f53=f34,f121,f52	}
1264656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f52=f34,f121,f52	};;
1265656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r17=f51
1266656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f63=f34,f122,f62
1267656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r25=r25,r24		}
1268656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	mov		carry1=0
1269656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f62=f34,f122,f62	};;
1270656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r33]=r25,16
1271656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f73=f34,f123,f72
1272656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r25,r24		}
1273656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f72=f34,f123,f72	};;//
1274656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
1275656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r18=f42
1276656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f44=f35,f120,f43
1277656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
1278656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	add		r17=r17,r16
1279656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.lu	f43=f35,f120,f43
1280656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		carry2=0		};;
1281656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r24=f70
1282656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f54=f35,f121,f53
1283656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p7,p0=r17,r16		}
1284656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f53=f35,f121,f53	};;
1285656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r25=f61
1286656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f64=f35,f122,f63
1287656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r18=r18,r17		}
1288656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f63=f35,f122,f63
1289656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		};;
1290656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r26=f52
1291656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		xma.hu	f74=f35,f123,f73
1292656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p7,p0=r18,r17		}
1293656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;		xma.lu	f73=f35,f123,f73
1294656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r18=r18,carry1		};;
1295656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//-------------------------------------------------//
1296656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	st8		[r32]=r18,16
1297656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2
1298656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p7,p0=r18,carry1	};;
1299656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1300656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r27=f43	// last major stall
1301656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	add		carry2=1,carry2		};;
1302656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r16=f71
1303656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r25=r25,r24
1304656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		carry1=0		};;
1305656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r17=f62
1306656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r25,r24
1307656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,r25		};;
1308656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1309656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1310656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r26,r25
1311656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r27=r27,r26		};;
1312656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1313656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1314656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r27,r26
1315656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r27=r27,carry2		};;
1316656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r18=f53
1317656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1318656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r27,carry2	};;
1319656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	st8		[r33]=r27,16
1320656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
1321656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1322656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r19=f44
1323656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r17=r17,r16
1324656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		mov		carry2=0	};;
1325656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	getf.sig	r24=f72
1326656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r17,r16
1327656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r18=r18,r17	};;
1328656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry2=1,carry2
1329656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r18,r17
1330656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r19=r19,r18	};;
1331656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry2=1,carry2
1332656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r19,r18
1333656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r19=r19,carry1	};;
1334656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	getf.sig	r25=f63
1335656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p7)	add		carry2=1,carry2
1336656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r19,carry1};;
1337656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		st8		[r32]=r19,16
1338656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p7)	add		carry2=1,carry2	}
1339656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1340656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	getf.sig	r26=f54
1341656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r25=r25,r24
1342656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		carry1=0		};;
1343656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r16=f73
1344656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r25,r24
1345656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,r25		};;
1346656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;
1347656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1348656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r26,r25
1349656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	add		r26=r26,carry2		};;
1350656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		getf.sig	r17=f64
1351656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1
1352656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.ltu		p6,p0=r26,carry2	};;
1353656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	st8		[r33]=r26,16
1354656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	add		carry1=1,carry1		}
1355656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1356656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	getf.sig	r24=f74
1357656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r17=r17,r16
1358656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		mov		carry2=0	};;
1359656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		cmp.ltu		p7,p0=r17,r16
1360656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		add		r17=r17,carry1	};;
1361656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1362656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p7)	add		carry2=1,carry2
1363656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project		cmp.ltu		p7,p0=r17,carry1};;
1364656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;		st8		[r32]=r17,16
1365656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p7)	add		carry2=1,carry2	};;
1366656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1367656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	add		r24=r24,carry2		};;
1368656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	st8		[r33]=r24		}
1369656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1370656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	rum		1<<5		// clear um.mfh
1371656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ret.sptk.many	b0	};;
1372656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_mul_comba4#
1373656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#undef	carry2
1374656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#undef	carry1
1375656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
1376656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1377656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 1
1378656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
1379656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// BN_ULONG bn_div_words(BN_ULONG h, BN_ULONG l, BN_ULONG d)
1380656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
1381656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// In the nutshell it's a port of my MIPS III/IV implementation.
1382656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
1383656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	AT	r14
1384656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	H	r16
1385656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	HH	r20
1386656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	L	r17
1387656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	D	r18
1388656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	DH	r22
1389656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	I	r21
1390656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1391656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#if 0
1392656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Some preprocessors (most notably HP-UX) appear to be allergic to
1393656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// macros enclosed to parenthesis [as these three were].
1394656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	cont	p16
1395656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	break	p0	// p20
1396656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#define	equ	p24
1397656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#else
1398656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectcont=p16
1399656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbreak=p0
1400656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectequ=p24
1401656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
1402656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1403656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	abort#
1404656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.global	bn_div_words#
1405656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.proc	bn_div_words#
1406656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	64
1407656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectbn_div_words:
1408656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.prologue
1409656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	ar.pfs,r2
1410656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	alloc		r2=ar.pfs,3,5,0,8
1411656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	b0,r3
1412656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r3=b0
1413656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.save	pr,r10
1414656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r10=pr		};;
1415656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmb;	cmp.eq		p6,p0=r34,r0
1416656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r8=-1
1417656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	br.ret.spnt.many	b0	};;
1418656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1419656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	.body
1420656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	mov		H=r32		// save h
1421656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		ar.ec=0		// don't rotate at exit
1422656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		pr.rot=0	}
1423656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	mov		L=r33		// save l
1424656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		r36=r0		};;
1425656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1426656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_divw_shift:	// -vv- note signed comparison
1427656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	(p0)	cmp.lt		p16,p0=r0,r34	// d
1428656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	shladd		r33=r34,1,r0	}
1429656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	(p0)	add		r35=1,r36
1430656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	nop.f		0x0
1431656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p16)	br.wtop.dpnt		.L_divw_shift	};;
1432656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1433656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	mov		D=r34
1434656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shr.u		DH=r34,32
1435656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	sub		r35=64,r36		};;
1436656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	setf.sig	f7=DH
1437656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shr.u		AT=H,r35
1438656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov		I=r36			};;
1439656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	cmp.ne		p6,p0=r0,AT
1440656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shl		H=H,r36
1441656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	br.call.spnt.clr	b0=abort	};;	// overflow, die...
1442656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1443656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	fcvt.xuf.s1	f7=f7
1444656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shr.u		AT=L,r35		};;
1445656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	shl		L=L,r36
1446656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	or		H=H,AT			};;
1447656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1448656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	nop.m		0x0
1449656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.leu		p6,p0=D,H;;
1450656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	sub		H=H,D			}
1451656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1452656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mlx;	setf.sig	f14=D
1453656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	movl		AT=0xffffffff		};;
1454656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project///////////////////////////////////////////////////////////
1455656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	setf.sig	f6=H
1456656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shr.u		HH=H,32;;
1457656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.eq		p6,p7=HH,DH		};;
1458656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;
1459656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	setf.sig	f8=AT
1460656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	fcvt.xuf.s1	f6=f6
1461656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	br.call.sptk	b6=.L_udiv64_32_b6	};;
1462656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1463656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r33=f8				// q
1464656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	xmpy.lu		f9=f8,f14		}
1465656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	xmpy.hu		f10=f8,f14
1466656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shrp		H=H,L,32		};;
1467656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1468656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	getf.sig	r35=f9				// tl
1469656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	getf.sig	r31=f10			};;	// th
1470656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1471656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_divw_1st_iter:
1472656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p0)	add		r32=-1,r33
1473656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	cmp.eq		equ,cont=HH,r31		};;
1474656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p0)	cmp.ltu		p8,p0=r35,D
1475656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	sub		r34=r35,D
1476656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(equ)	cmp.leu		break,cont=r35,H	};;
1477656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	(cont)	cmp.leu		cont,break=HH,r31
1478656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p8)	add		r31=-1,r31
1479656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(cont)	br.wtop.spnt		.L_divw_1st_iter	};;
1480656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project///////////////////////////////////////////////////////////
1481656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	sub		H=H,r35
1482656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shl		r8=r33,32
1483656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shl		L=L,32			};;
1484656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project///////////////////////////////////////////////////////////
1485656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	setf.sig	f6=H
1486656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shr.u		HH=H,32;;
1487656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	cmp.eq		p6,p7=HH,DH		};;
1488656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;
1489656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p6)	setf.sig	f8=AT
1490656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	fcvt.xuf.s1	f6=f6
1491656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(p7)	br.call.sptk	b6=.L_udiv64_32_b6	};;
1492656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1493656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	getf.sig	r33=f8				// q
1494656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	xmpy.lu		f9=f8,f14		}
1495656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfi;	xmpy.hu		f10=f8,f14
1496656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	shrp		H=H,L,32		};;
1497656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1498656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mmi;	getf.sig	r35=f9				// tl
1499656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	getf.sig	r31=f10			};;	// th
1500656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1501656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_divw_2nd_iter:
1502656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p0)	add		r32=-1,r33
1503656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	cmp.eq		equ,cont=HH,r31		};;
1504656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	(p0)	cmp.ltu		p8,p0=r35,D
1505656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p0)	sub		r34=r35,D
1506656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(equ)	cmp.leu		break,cont=r35,H	};;
1507656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mib;	(cont)	cmp.leu		cont,break=HH,r31
1508656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	(p8)	add		r31=-1,r31
1509656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(cont)	br.wtop.spnt		.L_divw_2nd_iter	};;
1510656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project///////////////////////////////////////////////////////////
1511656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	sub	H=H,r35
1512656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	or	r8=r8,r33
1513656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov	ar.pfs=r2		};;
1514656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mii;	shr.u	r9=H,I			// remainder if anybody wants it
1515656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	mov	pr=r10,0x1ffff		}
1516656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project{ .mfb;	br.ret.sptk.many	b0	};;
1517656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1518656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Unsigned 64 by 32 (well, by 64 for the moment) bit integer division
1519656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// procedure.
1520656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project//
1521656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// inputs:	f6 = (double)a, f7 = (double)b
1522656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// output:	f8 = (int)(a/b)
1523656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// clobbered:	f8,f9,f10,f11,pred
1524656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Projectpred=p15
1525656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// One can argue that this snippet is copyrighted to Intel
1526656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Corporation, as it's essentially identical to one of those
1527656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// found in "Divide, Square Root and Remainder" section at
1528656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// http://www.intel.com/software/products/opensource/libraries/num.htm.
1529656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// Yes, I admit that the referred code was used as template,
1530656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// but after I realized that there hardly is any other instruction
1531656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// sequence which would perform this operation. I mean I figure that
1532656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// any independent attempt to implement high-performance division
1533656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// will result in code virtually identical to the Intel code. It
1534656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// should be noted though that below division kernel is 1 cycle
1535656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// faster than Intel one (note commented splits:-), not to mention
1536656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project// original prologue (rather lack of one) and epilogue.
1537656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.align	32
1538656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.skip	16
1539656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.L_udiv64_32_b6:
1540656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	frcpa.s1	f8,pred=f6,f7;;		// [0]  y0 = 1 / b
1541656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1542656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(pred)	fnma.s1		f9=f7,f8,f1		// [5]  e0 = 1 - b * y0
1543656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(pred)	fmpy.s1		f10=f6,f8;;		// [5]  q0 = a * y0
1544656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(pred)	fmpy.s1		f11=f9,f9		// [10] e1 = e0 * e0
1545656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(pred)	fma.s1		f10=f9,f10,f10;;	// [10] q1 = q0 + e0 * q0
1546656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(pred)	fma.s1		f8=f9,f8,f8	//;;	// [15] y1 = y0 + e0 * y0
1547656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(pred)	fma.s1		f9=f11,f10,f10;;	// [15] q2 = q1 + e1 * q1
1548656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(pred)	fma.s1		f8=f11,f8,f8	//;;	// [20] y2 = y1 + e1 * y1
1549656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(pred)	fnma.s1		f10=f7,f9,f6;;		// [20] r2 = a - b * q2
1550656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project(pred)	fma.s1		f8=f10,f8,f9;;		// [25] q3 = q2 + r2 * y2
1551656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project
1552656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	fcvt.fxu.trunc.s1	f8=f8		// [30] q = trunc(q3)
1553656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project	br.ret.sptk.many	b6;;
1554656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project.endp	bn_div_words#
1555656d9c7f52f88b3a3daccafa7655dec086c4756eThe Android Open Source Project#endif
1556