larq.optimizers
¶
Neural networks with extremely low-precision weights and activations, such as Binarized Neural Networks (BNNs), usually contain a mix of low-precision weights (e.g. 1-bit) and higher-precision weights (e.g. 8-bit, 16-bit, or 32-bit). Examples of this include the first and last layers of image classificiation models, which have higher-precision weights in most BNN architectures from the literature.
Training a BNN, then, consists of optimizing both low-precision and higher-precision weights. In larq
, we provide a mechanism to target different bit-precision variables with different optimizers using the CaseOptimizer
class. Modeled after the tf.case
signature, CaseOptimizer
accepts pairs of predicates and optimizers. A predicate, given a variable, decides whether its optimizer should train that variable.
A CaseOptimizer
behaves much like any other Keras optimizer, and once you instantiate it you can pass it to your model.compile()
as usual. To instantiate a CaseOptimzer
, pass one or a list of (predicate, optimizer)
tuples, along with a default
optimizer which trains any variables not claimed by another optimizer. A variable may not be claimed by more than one optimizer's predicate.
Example
no_op_quantizer = lq.quantizers.NoOpQuantizer(precision=1)
layer = lq.layers.QuantDense(16, kernel_quantizer=no_op_quantizer)
case_optimizer = lq.optimizers.CaseOptimizer(
(
lq.optimizers.Bop.is_binary_variable, # predicate
lq.optimizers.Bop(threshold=1e-6, gamma=1e-3), # optimizer
),
default_optimizer=tf.keras.optimizers.Adam(0.01),
)
CaseOptimizer¶
larq.optimizers.CaseOptimizer(
*predicate_optimizer_pairs, default_optimizer=None, name="optimizer_case"
)
An optmizer wrapper that applies different optimizers to a subset of variables.
An optimizer is used to train a variable iff its accompanying predicate evaluates to True
.
For each variable, at most one optimizer's predicate may evaluate to True
. If no optimizer's predicate evaluates to True
for a variable, it is trained with the default_optimizer
. If a variable is claimed by no optimizers and default_optimizer == None
, the variable is not trained.
Arguments
- predicate_optimizer_pairs
Tuple[Callable[tf.Variable, bool], tf.keras.optimizers.Optimizer]
: One or more(pred, tf.keras.optimizers.Optimizer)
pairs, wherepred
takes onetf.Variable
as argument and returnsTrue
if the optimizer should be used for that variable, e.g.pred(var) == True
. - default_optimizer
Optional[tf.keras.optimizers.Optimizer]
: Atf.keras.optimizers.Optimizer
to be applied to any variable not claimed by any other optimizer. (Must be passed as keyword argument.)
Bop¶
larq.optimizers.Bop(threshold=1e-08, gamma=0.0001, name="Bop", **kwargs)
Binary optimizer (Bop).
Bop is a latent-free optimizer for Binarized Neural Networks (BNNs) and Binary Weight Networks (BWN).
Bop maintains an exponential moving average of the gradients controlled by gamma
. If this average exceeds the threshold
, a weight is flipped.
The hyperparameter gamma
is somewhat analogues to the learning rate in SGD methods: a high gamma
results in rapid convergence but also makes training more noisy.
Note that the default threshold
is not optimal for all situations. Setting the threshold too high results in little learning, while setting it too low results in overly noisy behaviour.
Warning
The is_binary_variable
check of this optimizer will only target variables that have been explicitly marked as being binary using NoOp(precision=1)
.
Example
no_op_quantizer = lq.quantizers.NoOp(precision=1)
layer = lq.layers.QuantDense(16, kernel_quantizer=no_op_quantizer)
optimizer = lq.optimizers.CaseOptimizer(
(lq.optimizers.Bop.is_binary_variable, lq.optimizers.Bop()),
default_optimizer=tf.keras.optimizers.Adam(0.01), # for FP weights
)
Arguments
- threshold
float
: magnitude of average gradient signal required to flip a weight. - gamma
float
: the adaptivity rate. - name
str
: name of the optimizer.
References