1 // Copyright 2020, The Android Open Source Project
2 //
3 // Licensed under the Apache License, Version 2.0 (the "License");
4 // you may not use this file except in compliance with the License.
5 // You may obtain a copy of the License at
6 //
7 //     http://www.apache.org/licenses/LICENSE-2.0
8 //
9 // Unless required by applicable law or agreed to in writing, software
10 // distributed under the License is distributed on an "AS IS" BASIS,
11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 // See the License for the specific language governing permissions and
13 // limitations under the License.
14 
15 //! This crate implements the `IKeystoreOperation` AIDL interface, which represents
16 //! an ongoing key operation, as well as the operation database, which is mainly
17 //! required for tracking operations for the purpose of pruning.
18 //! This crate also implements an operation pruning strategy.
19 //!
20 //! Operations implement the API calls update, finish, and abort.
21 //! Additionally, an operation can be dropped and pruned. The former
22 //! happens if the client deletes a binder to the operation object.
23 //! An existing operation may get pruned when running out of operation
24 //! slots and a new operation takes precedence.
25 //!
26 //! ## Operation Lifecycle
27 //! An operation gets created when the client calls `IKeystoreSecurityLevel::create`.
28 //! It may receive zero or more update request. The lifecycle ends when:
29 //!  * `update` yields an error.
30 //!  * `finish` is called.
31 //!  * `abort` is called.
32 //!  * The operation gets dropped.
33 //!  * The operation gets pruned.
34 //! `Operation` has an `Outcome` member. While the outcome is `Outcome::Unknown`,
35 //! the operation is active and in a good state. Any of the above conditions may
36 //! change the outcome to one of the defined outcomes Success, Abort, Dropped,
37 //! Pruned, or ErrorCode. The latter is chosen in the case of an unexpected error, during
38 //! `update` or `finish`. `Success` is chosen iff `finish` completes without error.
39 //! Note that all operations get dropped eventually in the sense that they lose
40 //! their last reference and get destroyed. At that point, the fate of the operation
41 //! gets logged. However, an operation will transition to `Outcome::Dropped` iff
42 //! the operation was still active (`Outcome::Unknown`) at that time.
43 //!
44 //! ## Operation Dropping
45 //! To observe the dropping of an operation, we have to make sure that there
46 //! are no strong references to the IBinder representing this operation.
47 //! This would be simple enough if the operation object would need to be accessed
48 //! only by transactions. But to perform pruning, we have to retain a reference to the
49 //! original operation object.
50 //!
51 //! ## Operation Pruning
52 //! Pruning an operation happens during the creation of a new operation.
53 //! We have to iterate through the operation database to find a suitable
54 //! candidate. Then we abort and finalize this operation setting its outcome to
55 //! `Outcome::Pruned`. The corresponding KeyMint operation slot will have been freed
56 //! up at this point, but the `Operation` object lingers. When the client
57 //! attempts to use the operation again they will receive
58 //! ErrorCode::INVALID_OPERATION_HANDLE indicating that the operation no longer
59 //! exits. This should be the cue for the client to destroy its binder.
60 //! At that point the operation gets dropped.
61 //!
62 //! ## Architecture
63 //! The `IKeystoreOperation` trait is implemented by `KeystoreOperation`.
64 //! This acts as a proxy object holding a strong reference to actual operation
65 //! implementation `Operation`.
66 //!
67 //! ```
68 //! struct KeystoreOperation {
69 //!     operation: Mutex<Option<Arc<Operation>>>,
70 //! }
71 //! ```
72 //!
73 //! The `Mutex` serves two purposes. It provides interior mutability allowing
74 //! us to set the Option to None. We do this when the life cycle ends during
75 //! a call to `update`, `finish`, or `abort`. As a result most of the Operation
76 //! related resources are freed. The `KeystoreOperation` proxy object still
77 //! lingers until dropped by the client.
78 //! The second purpose is to protect operations against concurrent usage.
79 //! Failing to lock this mutex yields `ResponseCode::OPERATION_BUSY` and indicates
80 //! a programming error in the client.
81 //!
82 //! Note that the Mutex only protects the operation against concurrent client calls.
83 //! We still retain weak references to the operation in the operation database:
84 //!
85 //! ```
86 //! struct OperationDb {
87 //!     operations: Mutex<Vec<Weak<Operation>>>
88 //! }
89 //! ```
90 //!
91 //! This allows us to access the operations for the purpose of pruning.
92 //! We do this in three phases.
93 //!  1. We gather the pruning information. Besides non mutable information,
94 //!     we access `last_usage` which is protected by a mutex.
95 //!     We only lock this mutex for single statements at a time. During
96 //!     this phase we hold the operation db lock.
97 //!  2. We choose a pruning candidate by computing the pruning resistance
98 //!     of each operation. We do this entirely with information we now
99 //!     have on the stack without holding any locks.
100 //!     (See `OperationDb::prune` for more details on the pruning strategy.)
101 //!  3. During pruning we briefly lock the operation database again to get the
102 //!     the pruning candidate by index. We then attempt to abort the candidate.
103 //!     If the candidate was touched in the meantime or is currently fulfilling
104 //!     a request (i.e., the client calls update, finish, or abort),
105 //!     we go back to 1 and try again.
106 //!
107 //! So the outer Mutex in `KeystoreOperation::operation` only protects
108 //! operations against concurrent client calls but not against concurrent
109 //! pruning attempts. This is what the `Operation::outcome` mutex is used for.
110 //!
111 //! ```
112 //! struct Operation {
113 //!     ...
114 //!     outcome: Mutex<Outcome>,
115 //!     ...
116 //! }
117 //! ```
118 //!
119 //! Any request that can change the outcome, i.e., `update`, `finish`, `abort`,
120 //! `drop`, and `prune` has to take the outcome lock and check if the outcome
121 //! is still `Outcome::Unknown` before entering. `prune` is special in that
122 //! it will `try_lock`, because we don't want to be blocked on a potentially
123 //! long running request at another operation. If it fails to get the lock
124 //! the operation is either being touched, which changes its pruning resistance,
125 //! or it transitions to its end-of-life, which means we may get a free slot.
126 //! Either way, we have to revaluate the pruning scores.
127 
128 use crate::enforcements::AuthInfo;
129 use crate::error::{map_err_with, map_km_error, map_or_log_err, Error, ErrorCode, ResponseCode};
130 use crate::metrics_store::log_key_operation_event_stats;
131 use crate::utils::{watchdog as wd, Asp};
132 use android_hardware_security_keymint::aidl::android::hardware::security::keymint::{
133     IKeyMintOperation::IKeyMintOperation, KeyParameter::KeyParameter, KeyPurpose::KeyPurpose,
134     SecurityLevel::SecurityLevel,
135 };
136 use android_hardware_security_keymint::binder::BinderFeatures;
137 use android_system_keystore2::aidl::android::system::keystore2::{
138     IKeystoreOperation::BnKeystoreOperation, IKeystoreOperation::IKeystoreOperation,
139 };
140 use anyhow::{anyhow, Context, Result};
141 use std::{
142     collections::HashMap,
143     sync::{Arc, Mutex, MutexGuard, Weak},
144     time::Duration,
145     time::Instant,
146 };
147 
148 /// Operations have `Outcome::Unknown` as long as they are active. They transition
149 /// to one of the other variants exactly once. The distinction in outcome is mainly
150 /// for the statistic.
151 #[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd)]
152 pub enum Outcome {
153     /// Operations have `Outcome::Unknown` as long as they are active.
154     Unknown,
155     /// Operation is successful.
156     Success,
157     /// Operation is aborted.
158     Abort,
159     /// Operation is dropped.
160     Dropped,
161     /// Operation is pruned.
162     Pruned,
163     /// Operation is failed with the error code.
164     ErrorCode(ErrorCode),
165 }
166 
167 /// Operation bundles all of the operation related resources and tracks the operation's
168 /// outcome.
169 #[derive(Debug)]
170 pub struct Operation {
171     // The index of this operation in the OperationDb.
172     index: usize,
173     km_op: Asp,
174     last_usage: Mutex<Instant>,
175     outcome: Mutex<Outcome>,
176     owner: u32, // Uid of the operation's owner.
177     auth_info: Mutex<AuthInfo>,
178     forced: bool,
179     logging_info: LoggingInfo,
180 }
181 
182 /// Keeps track of the information required for logging operations.
183 #[derive(Debug)]
184 pub struct LoggingInfo {
185     sec_level: SecurityLevel,
186     purpose: KeyPurpose,
187     op_params: Vec<KeyParameter>,
188     key_upgraded: bool,
189 }
190 
191 impl LoggingInfo {
192     /// Constructor
new( sec_level: SecurityLevel, purpose: KeyPurpose, op_params: Vec<KeyParameter>, key_upgraded: bool, ) -> LoggingInfo193     pub fn new(
194         sec_level: SecurityLevel,
195         purpose: KeyPurpose,
196         op_params: Vec<KeyParameter>,
197         key_upgraded: bool,
198     ) -> LoggingInfo {
199         Self { sec_level, purpose, op_params, key_upgraded }
200     }
201 }
202 
203 struct PruningInfo {
204     last_usage: Instant,
205     owner: u32,
206     index: usize,
207     forced: bool,
208 }
209 
210 // We don't except more than 32KiB of data in `update`, `updateAad`, and `finish`.
211 const MAX_RECEIVE_DATA: usize = 0x8000;
212 
213 impl Operation {
214     /// Constructor
new( index: usize, km_op: binder::Strong<dyn IKeyMintOperation>, owner: u32, auth_info: AuthInfo, forced: bool, logging_info: LoggingInfo, ) -> Self215     pub fn new(
216         index: usize,
217         km_op: binder::Strong<dyn IKeyMintOperation>,
218         owner: u32,
219         auth_info: AuthInfo,
220         forced: bool,
221         logging_info: LoggingInfo,
222     ) -> Self {
223         Self {
224             index,
225             km_op: Asp::new(km_op.as_binder()),
226             last_usage: Mutex::new(Instant::now()),
227             outcome: Mutex::new(Outcome::Unknown),
228             owner,
229             auth_info: Mutex::new(auth_info),
230             forced,
231             logging_info,
232         }
233     }
234 
get_pruning_info(&self) -> Option<PruningInfo>235     fn get_pruning_info(&self) -> Option<PruningInfo> {
236         // An operation may be finalized.
237         if let Ok(guard) = self.outcome.try_lock() {
238             match *guard {
239                 Outcome::Unknown => {}
240                 // If the outcome is any other than unknown, it has been finalized,
241                 // and we can no longer consider it for pruning.
242                 _ => return None,
243             }
244         }
245         // Else: If we could not grab the lock, this means that the operation is currently
246         //       being used and it may be transitioning to finalized or it was simply updated.
247         //       In any case it is fair game to consider it for pruning. If the operation
248         //       transitioned to a final state, we will notice when we attempt to prune, and
249         //       a subsequent attempt to create a new operation will succeed.
250         Some(PruningInfo {
251             // Expect safety:
252             // `last_usage` is locked only for primitive single line statements.
253             // There is no chance to panic and poison the mutex.
254             last_usage: *self.last_usage.lock().expect("In get_pruning_info."),
255             owner: self.owner,
256             index: self.index,
257             forced: self.forced,
258         })
259     }
260 
prune(&self, last_usage: Instant) -> Result<(), Error>261     fn prune(&self, last_usage: Instant) -> Result<(), Error> {
262         let mut locked_outcome = match self.outcome.try_lock() {
263             Ok(guard) => match *guard {
264                 Outcome::Unknown => guard,
265                 _ => return Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)),
266             },
267             Err(_) => return Err(Error::Rc(ResponseCode::OPERATION_BUSY)),
268         };
269 
270         // In `OperationDb::prune`, which is our caller, we first gather the pruning
271         // information including the last usage. When we select a candidate
272         // we call `prune` on that candidate passing the last_usage
273         // that we gathered earlier. If the actual last usage
274         // has changed since than, it means the operation was busy in the
275         // meantime, which means that we have to reevaluate the pruning score.
276         //
277         // Expect safety:
278         // `last_usage` is locked only for primitive single line statements.
279         // There is no chance to panic and poison the mutex.
280         if *self.last_usage.lock().expect("In Operation::prune()") != last_usage {
281             return Err(Error::Rc(ResponseCode::OPERATION_BUSY));
282         }
283         *locked_outcome = Outcome::Pruned;
284 
285         let km_op: binder::public_api::Strong<dyn IKeyMintOperation> =
286             match self.km_op.get_interface() {
287                 Ok(km_op) => km_op,
288                 Err(e) => {
289                     log::error!("In prune: Failed to get KeyMintOperation interface.\n    {:?}", e);
290                     return Err(Error::sys());
291                 }
292             };
293 
294         let _wp = wd::watch_millis("In Operation::prune: calling abort()", 500);
295 
296         // We abort the operation. If there was an error we log it but ignore it.
297         if let Err(e) = map_km_error(km_op.abort()) {
298             log::error!("In prune: KeyMint::abort failed with {:?}.", e);
299         }
300 
301         Ok(())
302     }
303 
304     // This function takes a Result from a KeyMint call and inspects it for errors.
305     // If an error was found it updates the given `locked_outcome` accordingly.
306     // It forwards the Result unmodified.
307     // The precondition to this call must be *locked_outcome == Outcome::Unknown.
308     // Ideally the `locked_outcome` came from a successful call to `check_active`
309     // see below.
update_outcome<T>( &self, locked_outcome: &mut Outcome, err: Result<T, Error>, ) -> Result<T, Error>310     fn update_outcome<T>(
311         &self,
312         locked_outcome: &mut Outcome,
313         err: Result<T, Error>,
314     ) -> Result<T, Error> {
315         match &err {
316             Err(Error::Km(e)) => *locked_outcome = Outcome::ErrorCode(*e),
317             Err(_) => *locked_outcome = Outcome::ErrorCode(ErrorCode::UNKNOWN_ERROR),
318             Ok(_) => (),
319         }
320         err
321     }
322 
323     // This function grabs the outcome lock and checks the current outcome state.
324     // If the outcome is still `Outcome::Unknown`, this function returns
325     // the locked outcome for further updates. In any other case it returns
326     // ErrorCode::INVALID_OPERATION_HANDLE indicating that this operation has
327     // been finalized and is no longer active.
check_active(&self) -> Result<MutexGuard<Outcome>>328     fn check_active(&self) -> Result<MutexGuard<Outcome>> {
329         let guard = self.outcome.lock().expect("In check_active.");
330         match *guard {
331             Outcome::Unknown => Ok(guard),
332             _ => Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)).context(format!(
333                 "In check_active: Call on finalized operation with outcome: {:?}.",
334                 *guard
335             )),
336         }
337     }
338 
339     // This function checks the amount of input data sent to us. We reject any buffer
340     // exceeding MAX_RECEIVE_DATA bytes as input to `update`, `update_aad`, and `finish`
341     // in order to force clients into using reasonable limits.
check_input_length(data: &[u8]) -> Result<()>342     fn check_input_length(data: &[u8]) -> Result<()> {
343         if data.len() > MAX_RECEIVE_DATA {
344             // This error code is unique, no context required here.
345             return Err(anyhow!(Error::Rc(ResponseCode::TOO_MUCH_DATA)));
346         }
347         Ok(())
348     }
349 
350     // Update the last usage to now.
touch(&self)351     fn touch(&self) {
352         // Expect safety:
353         // `last_usage` is locked only for primitive single line statements.
354         // There is no chance to panic and poison the mutex.
355         *self.last_usage.lock().expect("In touch.") = Instant::now();
356     }
357 
358     /// Implementation of `IKeystoreOperation::updateAad`.
359     /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details.
update_aad(&self, aad_input: &[u8]) -> Result<()>360     fn update_aad(&self, aad_input: &[u8]) -> Result<()> {
361         let mut outcome = self.check_active().context("In update_aad")?;
362         Self::check_input_length(aad_input).context("In update_aad")?;
363         self.touch();
364 
365         let km_op: binder::public_api::Strong<dyn IKeyMintOperation> =
366             self.km_op.get_interface().context("In update: Failed to get KeyMintOperation.")?;
367 
368         let (hat, tst) = self
369             .auth_info
370             .lock()
371             .unwrap()
372             .before_update()
373             .context("In update_aad: Trying to get auth tokens.")?;
374 
375         self.update_outcome(&mut *outcome, {
376             let _wp = wd::watch_millis("Operation::update_aad: calling updateAad", 500);
377             map_km_error(km_op.updateAad(aad_input, hat.as_ref(), tst.as_ref()))
378         })
379         .context("In update_aad: KeyMint::update failed.")?;
380 
381         Ok(())
382     }
383 
384     /// Implementation of `IKeystoreOperation::update`.
385     /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details.
update(&self, input: &[u8]) -> Result<Option<Vec<u8>>>386     fn update(&self, input: &[u8]) -> Result<Option<Vec<u8>>> {
387         let mut outcome = self.check_active().context("In update")?;
388         Self::check_input_length(input).context("In update")?;
389         self.touch();
390 
391         let km_op: binder::public_api::Strong<dyn IKeyMintOperation> =
392             self.km_op.get_interface().context("In update: Failed to get KeyMintOperation.")?;
393 
394         let (hat, tst) = self
395             .auth_info
396             .lock()
397             .unwrap()
398             .before_update()
399             .context("In update: Trying to get auth tokens.")?;
400 
401         let output = self
402             .update_outcome(&mut *outcome, {
403                 let _wp = wd::watch_millis("Operation::update: calling update", 500);
404                 map_km_error(km_op.update(input, hat.as_ref(), tst.as_ref()))
405             })
406             .context("In update: KeyMint::update failed.")?;
407 
408         if output.is_empty() {
409             Ok(None)
410         } else {
411             Ok(Some(output))
412         }
413     }
414 
415     /// Implementation of `IKeystoreOperation::finish`.
416     /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details.
finish(&self, input: Option<&[u8]>, signature: Option<&[u8]>) -> Result<Option<Vec<u8>>>417     fn finish(&self, input: Option<&[u8]>, signature: Option<&[u8]>) -> Result<Option<Vec<u8>>> {
418         let mut outcome = self.check_active().context("In finish")?;
419         if let Some(input) = input {
420             Self::check_input_length(input).context("In finish")?;
421         }
422         self.touch();
423 
424         let km_op: binder::public_api::Strong<dyn IKeyMintOperation> =
425             self.km_op.get_interface().context("In finish: Failed to get KeyMintOperation.")?;
426 
427         let (hat, tst, confirmation_token) = self
428             .auth_info
429             .lock()
430             .unwrap()
431             .before_finish()
432             .context("In finish: Trying to get auth tokens.")?;
433 
434         let output = self
435             .update_outcome(&mut *outcome, {
436                 let _wp = wd::watch_millis("Operation::finish: calling finish", 500);
437                 map_km_error(km_op.finish(
438                     input,
439                     signature,
440                     hat.as_ref(),
441                     tst.as_ref(),
442                     confirmation_token.as_deref(),
443                 ))
444             })
445             .context("In finish: KeyMint::finish failed.")?;
446 
447         self.auth_info.lock().unwrap().after_finish().context("In finish.")?;
448 
449         // At this point the operation concluded successfully.
450         *outcome = Outcome::Success;
451 
452         if output.is_empty() {
453             Ok(None)
454         } else {
455             Ok(Some(output))
456         }
457     }
458 
459     /// Aborts the operation if it is active. IFF the operation is aborted the outcome is
460     /// set to `outcome`. `outcome` must reflect the reason for the abort. Since the operation
461     /// gets aborted `outcome` must not be `Operation::Success` or `Operation::Unknown`.
abort(&self, outcome: Outcome) -> Result<()>462     fn abort(&self, outcome: Outcome) -> Result<()> {
463         let mut locked_outcome = self.check_active().context("In abort")?;
464         *locked_outcome = outcome;
465         let km_op: binder::public_api::Strong<dyn IKeyMintOperation> =
466             self.km_op.get_interface().context("In abort: Failed to get KeyMintOperation.")?;
467 
468         {
469             let _wp = wd::watch_millis("Operation::abort: calling abort", 500);
470             map_km_error(km_op.abort()).context("In abort: KeyMint::abort failed.")
471         }
472     }
473 }
474 
475 impl Drop for Operation {
drop(&mut self)476     fn drop(&mut self) {
477         let guard = self.outcome.lock().expect("In drop.");
478         log_key_operation_event_stats(
479             self.logging_info.sec_level,
480             self.logging_info.purpose,
481             &(self.logging_info.op_params),
482             &guard,
483             self.logging_info.key_upgraded,
484         );
485         if let Outcome::Unknown = *guard {
486             drop(guard);
487             // If the operation was still active we call abort, setting
488             // the outcome to `Outcome::Dropped`
489             if let Err(e) = self.abort(Outcome::Dropped) {
490                 log::error!("While dropping Operation: abort failed:\n    {:?}", e);
491             }
492         }
493     }
494 }
495 
496 /// The OperationDb holds weak references to all ongoing operations.
497 /// Its main purpose is to facilitate operation pruning.
498 #[derive(Debug, Default)]
499 pub struct OperationDb {
500     // TODO replace Vec with WeakTable when the weak_table crate becomes
501     // available.
502     operations: Mutex<Vec<Weak<Operation>>>,
503 }
504 
505 impl OperationDb {
506     /// Creates a new OperationDb.
new() -> Self507     pub fn new() -> Self {
508         Self { operations: Mutex::new(Vec::new()) }
509     }
510 
511     /// Creates a new operation.
512     /// This function takes a KeyMint operation and an associated
513     /// owner uid and returns a new Operation wrapped in a `std::sync::Arc`.
create_operation( &self, km_op: binder::public_api::Strong<dyn IKeyMintOperation>, owner: u32, auth_info: AuthInfo, forced: bool, logging_info: LoggingInfo, ) -> Arc<Operation>514     pub fn create_operation(
515         &self,
516         km_op: binder::public_api::Strong<dyn IKeyMintOperation>,
517         owner: u32,
518         auth_info: AuthInfo,
519         forced: bool,
520         logging_info: LoggingInfo,
521     ) -> Arc<Operation> {
522         // We use unwrap because we don't allow code that can panic while locked.
523         let mut operations = self.operations.lock().expect("In create_operation.");
524 
525         let mut index: usize = 0;
526         // First we iterate through the operation slots to try and find an unused
527         // slot. If we don't find one, we append the new entry instead.
528         match (*operations).iter_mut().find(|s| {
529             index += 1;
530             s.upgrade().is_none()
531         }) {
532             Some(free_slot) => {
533                 let new_op = Arc::new(Operation::new(
534                     index - 1,
535                     km_op,
536                     owner,
537                     auth_info,
538                     forced,
539                     logging_info,
540                 ));
541                 *free_slot = Arc::downgrade(&new_op);
542                 new_op
543             }
544             None => {
545                 let new_op = Arc::new(Operation::new(
546                     operations.len(),
547                     km_op,
548                     owner,
549                     auth_info,
550                     forced,
551                     logging_info,
552                 ));
553                 operations.push(Arc::downgrade(&new_op));
554                 new_op
555             }
556         }
557     }
558 
get(&self, index: usize) -> Option<Arc<Operation>>559     fn get(&self, index: usize) -> Option<Arc<Operation>> {
560         self.operations.lock().expect("In OperationDb::get.").get(index).and_then(|op| op.upgrade())
561     }
562 
563     /// Attempts to prune an operation.
564     ///
565     /// This function is used during operation creation, i.e., by
566     /// `KeystoreSecurityLevel::create_operation`, to try and free up an operation slot
567     /// if it got `ErrorCode::TOO_MANY_OPERATIONS` from the KeyMint backend. It is not
568     /// guaranteed that an operation slot is available after this call successfully
569     /// returned for various reasons. E.g., another thread may have snatched up the newly
570     /// available slot. Callers may have to call prune multiple times before they get a
571     /// free operation slot. Prune may also return `Err(Error::Rc(ResponseCode::BACKEND_BUSY))`
572     /// which indicates that no prunable operation was found.
573     ///
574     /// To find a suitable candidate we compute the malus for the caller and each existing
575     /// operation. The malus is the inverse of the pruning power (caller) or pruning
576     /// resistance (existing operation).
577     ///
578     /// The malus is based on the number of sibling operations and age. Sibling
579     /// operations are operations that have the same owner (UID).
580     ///
581     /// Every operation, existing or new, starts with a malus of 1. Every sibling
582     /// increases the malus by one. The age is the time since an operation was last touched.
583     /// It increases the malus by log6(<age in seconds> + 1) rounded down to the next
584     /// integer. So the malus increases stepwise after 5s, 35s, 215s, ...
585     /// Of two operations with the same malus the least recently used one is considered
586     /// weaker.
587     ///
588     /// For the caller to be able to prune an operation it must find an operation
589     /// with a malus higher than its own.
590     ///
591     /// The malus can be expressed as
592     /// ```
593     /// malus = 1 + no_of_siblings + floor(log6(age_in_seconds + 1))
594     /// ```
595     /// where the constant `1` accounts for the operation under consideration.
596     /// In reality we compute it as
597     /// ```
598     /// caller_malus = 1 + running_siblings
599     /// ```
600     /// because the new operation has no age and is not included in the `running_siblings`,
601     /// and
602     /// ```
603     /// running_malus = running_siblings + floor(log6(age_in_seconds + 1))
604     /// ```
605     /// because a running operation is included in the `running_siblings` and it has
606     /// an age.
607     ///
608     /// ## Example
609     /// A caller with no running operations has a malus of 1. Young (age < 5s) operations
610     /// also with no siblings have a malus of one and cannot be pruned by the caller.
611     /// We have to find an operation that has at least one sibling or is older than 5s.
612     ///
613     /// A caller with one running operation has a malus of 2. Now even young siblings
614     /// or single child aging (5s <= age < 35s) operations are off limit. An aging
615     /// sibling of two, however, would have a malus of 3 and would be fair game.
616     ///
617     /// ## Rationale
618     /// Due to the limitation of KeyMint operation slots, we cannot get around pruning or
619     /// a single app could easily DoS KeyMint.
620     /// Keystore 1.0 used to always prune the least recently used operation. This at least
621     /// guaranteed that new operations can always be started. With the increased usage
622     /// of Keystore we saw increased pruning activity which can lead to a livelock
623     /// situation in the worst case.
624     ///
625     /// With the new pruning strategy we want to provide well behaved clients with
626     /// progress assurances while punishing DoS attempts. As a result of this
627     /// strategy we can be in the situation where no operation can be pruned and the
628     /// creation of a new operation fails. This allows single child operations which
629     /// are frequently updated to complete, thereby breaking up livelock situations
630     /// and facilitating system wide progress.
631     ///
632     /// ## Update
633     /// We also allow callers to cannibalize their own sibling operations if no other
634     /// slot can be found. In this case the least recently used sibling is pruned.
prune(&self, caller: u32, forced: bool) -> Result<(), Error>635     pub fn prune(&self, caller: u32, forced: bool) -> Result<(), Error> {
636         loop {
637             // Maps the uid of the owner to the number of operations that owner has
638             // (running_siblings). More operations per owner lowers the pruning
639             // resistance of the operations of that owner. Whereas the number of
640             // ongoing operations of the caller lowers the pruning power of the caller.
641             let mut owners: HashMap<u32, u64> = HashMap::new();
642             let mut pruning_info: Vec<PruningInfo> = Vec::new();
643 
644             let now = Instant::now();
645             self.operations
646                 .lock()
647                 .expect("In OperationDb::prune: Trying to lock self.operations.")
648                 .iter()
649                 .for_each(|op| {
650                     if let Some(op) = op.upgrade() {
651                         if let Some(p_info) = op.get_pruning_info() {
652                             let owner = p_info.owner;
653                             pruning_info.push(p_info);
654                             // Count operations per owner.
655                             *owners.entry(owner).or_insert(0) += 1;
656                         }
657                     }
658                 });
659 
660             // If the operation is forced, the caller has a malus of 0.
661             let caller_malus = if forced { 0 } else { 1u64 + *owners.entry(caller).or_default() };
662 
663             // We iterate through all operations computing the malus and finding
664             // the candidate with the highest malus which must also be higher
665             // than the caller_malus.
666             struct CandidateInfo {
667                 index: usize,
668                 malus: u64,
669                 last_usage: Instant,
670                 age: Duration,
671             }
672             let mut oldest_caller_op: Option<CandidateInfo> = None;
673             let candidate = pruning_info.iter().fold(
674                 None,
675                 |acc: Option<CandidateInfo>, &PruningInfo { last_usage, owner, index, forced }| {
676                     // Compute the age of the current operation.
677                     let age = now
678                         .checked_duration_since(last_usage)
679                         .unwrap_or_else(|| Duration::new(0, 0));
680 
681                     // Find the least recently used sibling as an alternative pruning candidate.
682                     if owner == caller {
683                         if let Some(CandidateInfo { age: a, .. }) = oldest_caller_op {
684                             if age > a {
685                                 oldest_caller_op =
686                                     Some(CandidateInfo { index, malus: 0, last_usage, age });
687                             }
688                         } else {
689                             oldest_caller_op =
690                                 Some(CandidateInfo { index, malus: 0, last_usage, age });
691                         }
692                     }
693 
694                     // Compute the malus of the current operation.
695                     let malus = if forced {
696                         // Forced operations have a malus of 0. And cannot even be pruned
697                         // by other forced operations.
698                         0
699                     } else {
700                         // Expect safety: Every owner in pruning_info was counted in
701                         // the owners map. So this unwrap cannot panic.
702                         *owners.get(&owner).expect(
703                             "This is odd. We should have counted every owner in pruning_info.",
704                         ) + ((age.as_secs() + 1) as f64).log(6.0).floor() as u64
705                     };
706 
707                     // Now check if the current operation is a viable/better candidate
708                     // the one currently stored in the accumulator.
709                     match acc {
710                         // First we have to find any operation that is prunable by the caller.
711                         None => {
712                             if caller_malus < malus {
713                                 Some(CandidateInfo { index, malus, last_usage, age })
714                             } else {
715                                 None
716                             }
717                         }
718                         // If we have found one we look for the operation with the worst score.
719                         // If there is a tie, the older operation is considered weaker.
720                         Some(CandidateInfo { index: i, malus: m, last_usage: l, age: a }) => {
721                             if malus > m || (malus == m && age > a) {
722                                 Some(CandidateInfo { index, malus, last_usage, age })
723                             } else {
724                                 Some(CandidateInfo { index: i, malus: m, last_usage: l, age: a })
725                             }
726                         }
727                     }
728                 },
729             );
730 
731             // If we did not find a suitable candidate we may cannibalize our oldest sibling.
732             let candidate = candidate.or(oldest_caller_op);
733 
734             match candidate {
735                 Some(CandidateInfo { index, malus: _, last_usage, age: _ }) => {
736                     match self.get(index) {
737                         Some(op) => {
738                             match op.prune(last_usage) {
739                                 // We successfully freed up a slot.
740                                 Ok(()) => break Ok(()),
741                                 // This means the operation we tried to prune was on its way
742                                 // out. It also means that the slot it had occupied was freed up.
743                                 Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) => break Ok(()),
744                                 // This means the operation we tried to prune was currently
745                                 // servicing a request. There are two options.
746                                 // * Assume that it was touched, which means that its
747                                 //   pruning resistance increased. In that case we have
748                                 //   to start over and find another candidate.
749                                 // * Assume that the operation is transitioning to end-of-life.
750                                 //   which means that we got a free slot for free.
751                                 // If we assume the first but the second is true, we prune
752                                 // a good operation without need (aggressive approach).
753                                 // If we assume the second but the first is true, our
754                                 // caller will attempt to create a new KeyMint operation,
755                                 // fail with `ErrorCode::TOO_MANY_OPERATIONS`, and call
756                                 // us again (conservative approach).
757                                 Err(Error::Rc(ResponseCode::OPERATION_BUSY)) => {
758                                     // We choose the conservative approach, because
759                                     // every needlessly pruned operation can impact
760                                     // the user experience.
761                                     // To switch to the aggressive approach replace
762                                     // the following line with `continue`.
763                                     break Ok(());
764                                 }
765 
766                                 // The candidate may have been touched so the score
767                                 // has changed since our evaluation.
768                                 _ => continue,
769                             }
770                         }
771                         // This index does not exist any more. The operation
772                         // in this slot was dropped. Good news, a slot
773                         // has freed up.
774                         None => break Ok(()),
775                     }
776                 }
777                 // We did not get a pruning candidate.
778                 None => break Err(Error::Rc(ResponseCode::BACKEND_BUSY)),
779             }
780         }
781     }
782 }
783 
784 /// Implementation of IKeystoreOperation.
785 pub struct KeystoreOperation {
786     operation: Mutex<Option<Arc<Operation>>>,
787 }
788 
789 impl KeystoreOperation {
790     /// Creates a new operation instance wrapped in a
791     /// BnKeystoreOperation proxy object. It also enables
792     /// `BinderFeatures::set_requesting_sid` on the new interface, because
793     /// we need it for checking Keystore permissions.
new_native_binder( operation: Arc<Operation>, ) -> binder::public_api::Strong<dyn IKeystoreOperation>794     pub fn new_native_binder(
795         operation: Arc<Operation>,
796     ) -> binder::public_api::Strong<dyn IKeystoreOperation> {
797         BnKeystoreOperation::new_binder(
798             Self { operation: Mutex::new(Some(operation)) },
799             BinderFeatures { set_requesting_sid: true, ..BinderFeatures::default() },
800         )
801     }
802 
803     /// Grabs the outer operation mutex and calls `f` on the locked operation.
804     /// The function also deletes the operation if it returns with an error or if
805     /// `delete_op` is true.
with_locked_operation<T, F>(&self, f: F, delete_op: bool) -> Result<T> where for<'a> F: FnOnce(&'a Operation) -> Result<T>,806     fn with_locked_operation<T, F>(&self, f: F, delete_op: bool) -> Result<T>
807     where
808         for<'a> F: FnOnce(&'a Operation) -> Result<T>,
809     {
810         let mut delete_op: bool = delete_op;
811         match self.operation.try_lock() {
812             Ok(mut mutex_guard) => {
813                 let result = match &*mutex_guard {
814                     Some(op) => {
815                         let result = f(&*op);
816                         // Any error here means we can discard the operation.
817                         if result.is_err() {
818                             delete_op = true;
819                         }
820                         result
821                     }
822                     None => Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE))
823                         .context("In KeystoreOperation::with_locked_operation"),
824                 };
825 
826                 if delete_op {
827                     // We give up our reference to the Operation, thereby freeing up our
828                     // internal resources and ending the wrapped KeyMint operation.
829                     // This KeystoreOperation object will still be owned by an SpIBinder
830                     // until the client drops its remote reference.
831                     *mutex_guard = None;
832                 }
833                 result
834             }
835             Err(_) => Err(Error::Rc(ResponseCode::OPERATION_BUSY))
836                 .context("In KeystoreOperation::with_locked_operation"),
837         }
838     }
839 }
840 
841 impl binder::Interface for KeystoreOperation {}
842 
843 impl IKeystoreOperation for KeystoreOperation {
updateAad(&self, aad_input: &[u8]) -> binder::public_api::Result<()>844     fn updateAad(&self, aad_input: &[u8]) -> binder::public_api::Result<()> {
845         let _wp = wd::watch_millis("IKeystoreOperation::updateAad", 500);
846         map_or_log_err(
847             self.with_locked_operation(
848                 |op| op.update_aad(aad_input).context("In KeystoreOperation::updateAad"),
849                 false,
850             ),
851             Ok,
852         )
853     }
854 
update(&self, input: &[u8]) -> binder::public_api::Result<Option<Vec<u8>>>855     fn update(&self, input: &[u8]) -> binder::public_api::Result<Option<Vec<u8>>> {
856         let _wp = wd::watch_millis("IKeystoreOperation::update", 500);
857         map_or_log_err(
858             self.with_locked_operation(
859                 |op| op.update(input).context("In KeystoreOperation::update"),
860                 false,
861             ),
862             Ok,
863         )
864     }
finish( &self, input: Option<&[u8]>, signature: Option<&[u8]>, ) -> binder::public_api::Result<Option<Vec<u8>>>865     fn finish(
866         &self,
867         input: Option<&[u8]>,
868         signature: Option<&[u8]>,
869     ) -> binder::public_api::Result<Option<Vec<u8>>> {
870         let _wp = wd::watch_millis("IKeystoreOperation::finish", 500);
871         map_or_log_err(
872             self.with_locked_operation(
873                 |op| op.finish(input, signature).context("In KeystoreOperation::finish"),
874                 true,
875             ),
876             Ok,
877         )
878     }
879 
abort(&self) -> binder::public_api::Result<()>880     fn abort(&self) -> binder::public_api::Result<()> {
881         let _wp = wd::watch_millis("IKeystoreOperation::abort", 500);
882         map_err_with(
883             self.with_locked_operation(
884                 |op| op.abort(Outcome::Abort).context("In KeystoreOperation::abort"),
885                 true,
886             ),
887             |e| {
888                 match e.root_cause().downcast_ref::<Error>() {
889                     // Calling abort on expired operations is something very common.
890                     // There is no reason to clutter the log with it. It is never the cause
891                     // for a true problem.
892                     Some(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) => {}
893                     _ => log::error!("{:?}", e),
894                 };
895                 e
896             },
897             Ok,
898         )
899     }
900 }
901