Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Sign in / Register
Toggle navigation
B
Basedformer
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Locked Files
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Security & Compliance
Security & Compliance
Dependency List
License Compliance
Packages
Packages
List
Container Registry
Analytics
Analytics
CI / CD
Code Review
Insights
Issues
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
novelai-storage
Basedformer
Commits
6a53993a
Commit
6a53993a
authored
Aug 26, 2022
by
kurumuz
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
terrible hack :?
parent
c20db765
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
38 additions
and
36 deletions
+38
-36
basedformer/models/ds_strats.py
basedformer/models/ds_strats.py
+38
-36
No files found.
basedformer/models/ds_strats.py
View file @
6a53993a
from
deepspeed.module_inject
import
DSPolicy
import
torch
from
torch.nn.parameter
import
Parameter
from
basedformer
import
models
class
BasedformerGPTJLayerPolicy
(
DSPolicy
):
_orig_layer_class
=
None
#can't have original layer class because in transformerfork all models are just one class
#needs some config from the model.config, including:
#rotary_dim, layer_norm_epsilon
def
__init__
(
self
,
client_module
,
inference
=
True
):
super
()
.
__init__
(
inference
,
scale_attention
=
True
)
self
.
client_module
=
client_module
def
GPTJTransform
(
model
):
def
get_hidden_heads
(
self
):
return
self
.
client_module
.
attn
.
q_proj
.
weight
.
shape
[
1
],
\
self
.
client_module
.
attn
.
n_head
class
BasedformerGPTJLayerPolicy
(
DSPolicy
):
_orig_layer_class
=
None
#can't have original layer class because in transformerfork all models are just one class
#needs some config from the model.config, including:
#rotary_dim, layer_norm_epsilon
def
__init__
(
self
,
client_module
,
inference
=
True
):
super
()
.
__init__
(
inference
,
scale_attention
=
True
)
self
.
client_module
=
client_module
def
attention
(
self
):
qw
=
self
.
client_module
.
attn
.
q_proj
.
weight
kw
=
self
.
client_module
.
attn
.
k_proj
.
weight
vw
=
self
.
client_module
.
attn
.
v_proj
.
weight
def
get_hidden_heads
(
self
):
return
self
.
client_module
.
attn
.
q_proj
.
weight
.
shape
[
1
],
\
self
.
client_module
.
attn
.
n_head
qkvw
=
Parameter
(
torch
.
cat
((
qw
,
kw
,
vw
),
dim
=
0
),
requires_grad
=
False
)
def
attention
(
self
):
qw
=
self
.
client_module
.
attn
.
q_proj
.
weight
kw
=
self
.
client_module
.
attn
.
k_proj
.
weight
vw
=
self
.
client_module
.
attn
.
v_proj
.
weight
return
self
.
linear_layer
,
\
qkvw
,
\
None
,
\
self
.
client_module
.
attn
.
out_proj
.
weight
,
\
None
,
\
self
.
scale_attention
,
\
self
.
is_megatron_v2
qkvw
=
Parameter
(
torch
.
cat
((
qw
,
kw
,
vw
),
dim
=
0
),
requires_grad
=
False
)
def
mlp
(
self
):
return
self
.
linear_layer
,
\
self
.
client_module
.
ff
.
ff1
.
weight
,
\
self
.
client_module
.
ff
.
ff1
.
bias
,
\
self
.
client_module
.
ff
.
ff2
.
weight
,
\
self
.
client_module
.
ff
.
ff2
.
bias
return
self
.
linear_layer
,
\
qkvw
,
\
None
,
\
self
.
client_module
.
attn
.
out_proj
.
weight
,
\
None
,
\
self
.
scale_attention
,
\
self
.
is_megatron_v2
def
layerNorm
(
self
):
return
None
,
\
None
,
\
self
.
client_module
.
ln_preattn
.
weight
,
\
self
.
client_module
.
ln_preattn
.
bias
def
mlp
(
self
):
return
self
.
linear_layer
,
\
self
.
client_module
.
ff
.
ff1
.
weight
,
\
self
.
client_module
.
ff
.
ff1
.
bias
,
\
self
.
client_module
.
ff
.
ff2
.
weight
,
\
self
.
client_module
.
ff
.
ff2
.
bias
def
GPTJTransform
(
model
):
def
layerNorm
(
self
):
return
None
,
\
None
,
\
self
.
client_module
.
ln_preattn
.
weight
,
\
self
.
client_module
.
ln_preattn
.
bias
model
.
config
.
rotary_dim
=
model
.
layers
[
0
]
.
attn
.
rotary_dim
model
.
config
.
layer_norm_epsilon
=
1e-5
...
...
@@ -52,6 +53,7 @@ def GPTJTransform(model):
model
.
get_embeds
=
model
.
get_embeds_ds
import
deepspeed
from
deepspeed.module_inject
import
DSPolicy
model
=
deepspeed
.
init_inference
(
model
,
mp_size
=
1
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment