1# -*- coding: utf-8 -*-
2# Copyright 2012 Google Inc. All Rights Reserved.
3#
4# Licensed under the Apache License, Version 2.0 (the "License");
5# you may not use this file except in compliance with the License.
6# You may obtain a copy of the License at
7#
8#     http://www.apache.org/licenses/LICENSE-2.0
9#
10# Unless required by applicable law or agreed to in writing, software
11# distributed under the License is distributed on an "AS IS" BASIS,
12# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13# See the License for the specific language governing permissions and
14# limitations under the License.
15"""Additional help about object versioning."""
16
17from __future__ import absolute_import
18
19from gslib.help_provider import HelpProvider
20
21_DETAILED_HELP_TEXT = ("""
22<B>OVERVIEW</B>
23  Versioning-enabled buckets maintain an archive of objects, providing a way to
24  un-delete data that you accidentally deleted, or to retrieve older versions of
25  your data. You can turn versioning on or off for a bucket at any time. Turning
26  versioning off leaves existing object versions in place, and simply causes the
27  bucket to stop accumulating new object versions. In this case, if you upload
28  to an existing object the current version is overwritten instead of creating
29  a new version.
30
31  Regardless of whether you have enabled versioning on a bucket, every object
32  has two associated positive integer fields:
33
34  - the generation, which is updated when the content of an object is
35    overwritten.
36  - the metageneration, which identifies the metadata generation. It starts
37    at 1; is updated every time the metadata (e.g., ACL or Content-Type) for a
38    given content generation is updated; and gets reset when the generation
39    number changes.
40
41  Of these two integers, only the generation is used when working with versioned
42  data. Both generation and metageneration can be used with concurrency control
43  (discussed in a later section).
44
45  To work with object versioning in gsutil, you can use a flavor of storage URIs
46  that that embed the object generation, which we refer to as version-specific
47  URIs. For example, the version-less object URI:
48
49    gs://bucket/object
50
51  might have have two versions, with these version-specific URIs:
52
53    gs://bucket/object#1360383693690000
54    gs://bucket/object#1360383802725000
55
56  The following sections discuss how to work with versioning and concurrency
57  control.
58
59
60<B>OBJECT VERSIONING</B>
61  You can view, enable, and disable object versioning on a bucket using
62  the 'versioning get' and 'versioning set' commands. For example:
63
64    gsutil versioning set on gs://bucket
65
66  will enable versioning for the named bucket. See 'gsutil help versioning'
67  for additional details.
68
69  To see all object versions in a versioning-enabled bucket along with
70  their generation.metageneration information, use gsutil ls -a:
71
72    gsutil ls -a gs://bucket
73
74  You can also specify particular objects for which you want to find the
75  version-specific URI(s), or you can use wildcards:
76
77    gsutil ls -a gs://bucket/object1 gs://bucket/images/*.jpg
78
79  The generation values form a monotonically increasing sequence as you create
80  additional object versions.  Because of this, the latest object version is
81  always the last one listed in the gsutil ls output for a particular object.
82  For example, if a bucket contains these three versions of gs://bucket/object:
83
84    gs://bucket/object#1360035307075000
85    gs://bucket/object#1360101007329000
86    gs://bucket/object#1360102216114000
87
88  then gs://bucket/object#1360102216114000 is the latest version and
89  gs://bucket/object#1360035307075000 is the oldest available version.
90
91  If you specify version-less URIs with gsutil, you will operate on the
92  latest not-deleted version of an object, for example:
93
94    gsutil cp gs://bucket/object ./dir
95
96  or:
97
98    gsutil rm gs://bucket/object
99
100  To operate on a specific object version, use a version-specific URI.
101  For example, suppose the output of the above gsutil ls -a command is:
102
103    gs://bucket/object#1360035307075000
104    gs://bucket/object#1360101007329000
105
106  In this case, the command:
107
108    gsutil cp gs://bucket/object#1360035307075000 ./dir
109
110  will retrieve the second most recent version of the object.
111
112  Note that version-specific URIs cannot be the target of the gsutil cp
113  command (trying to do so will result in an error), because writing to a
114  versioned object always creates a new version.
115
116  If an object has been deleted, it will not show up in a normal gsutil ls
117  listing (i.e., ls without the -a option). You can restore a deleted object by
118  running gsutil ls -a to find the available versions, and then copying one of
119  the version-specific URIs to the version-less URI, for example:
120
121    gsutil cp gs://bucket/object#1360101007329000 gs://bucket/object
122
123  Note that when you do this it creates a new object version, which will incur
124  additional charges. You can get rid of the extra copy by deleting the older
125  version-specfic object:
126
127    gsutil rm gs://bucket/object#1360101007329000
128
129  Or you can combine the two steps by using the gsutil mv command:
130
131    gsutil mv gs://bucket/object#1360101007329000 gs://bucket/object
132
133  If you want to remove all versions of an object use the gsutil rm -a option:
134
135    gsutil rm -a gs://bucket/object
136
137  Note that there is no limit to the number of older versions of an object you
138  will create if you continue to upload to the same object in a versioning-
139  enabled bucket. It is your responsibility to delete versions beyond the ones
140  you want to retain.
141
142
143<B>COPYING VERSIONED BUCKETS</B>
144  You can copy data between two versioned buckets, using a command like:
145
146    gsutil cp -r -A gs://bucket1/* gs://bucket2
147
148  When run using versioned buckets, this command will cause every object version
149  to be copied. The copies made in gs://bucket2 will have different generation
150  numbers (since a new generation is assigned when the object copy is made),
151  but the object sort order will remain consistent. For example, gs://bucket1
152  might contain:
153
154    % gsutil ls -la gs://bucket1 10  2013-06-06T02:33:11Z
155    53  2013-02-02T22:30:57Z  gs://bucket1/file#1359844257574000  metageneration=1
156    12  2013-02-02T22:30:57Z  gs://bucket1/file#1359844257615000  metageneration=1
157    97  2013-02-02T22:30:57Z  gs://bucket1/file#1359844257665000  metageneration=1
158
159  and after the copy, gs://bucket2 might contain:
160
161    % gsutil ls -la gs://bucket2
162    53  2013-06-06T02:33:11Z  gs://bucket2/file#1370485991580000  metageneration=1
163    12  2013-06-06T02:33:14Z  gs://bucket2/file#1370485994328000  metageneration=1
164    97  2013-06-06T02:33:17Z  gs://bucket2/file#1370485997376000  metageneration=1
165
166  Note that the object versions are in the same order (as can be seen by the
167  same sequence of sizes in both listings), but the generation numbers (and
168  timestamps) are newer in gs://bucket2.
169
170
171
172<B>CONCURRENCY CONTROL</B>
173  If you are building an application using Google Cloud Storage, you may need to
174  be careful about concurrency control. Normally gsutil itself isn't used for
175  this purpose, but it's possible to write scripts around gsutil that perform
176  concurrency control.
177
178  For example, suppose you want to implement a "rolling update" system using
179  gsutil, where a periodic job computes some data and uploads it to the cloud.
180  On each run, the job starts with the data that it computed from last run, and
181  computes a new value. To make this system robust, you need to have multiple
182  machines on which the job can run, which raises the possibility that two
183  simultaneous runs could attempt to update an object at the same time. This
184  leads to the following potential race condition:
185
186  - job 1 computes the new value to be written
187  - job 2 computes the new value to be written
188  - job 2 writes the new value
189  - job 1 writes the new value
190
191  In this case, the value that job 1 read is no longer current by the time
192  it goes to write the updated object, and writing at this point would result
193  in stale (or, depending on the application, corrupt) data.
194
195  To prevent this, you can find the version-specific name of the object that was
196  created, and then use the information contained in that URI to specify an
197  x-goog-if-generation-match header on a subsequent gsutil cp command. You can
198  do this in two steps. First, use the gsutil cp -v option at upload time to get
199  the version-specific name of the object that was created, for example:
200
201    gsutil cp -v file gs://bucket/object
202
203  might output:
204
205    Created: gs://bucket/object#1360432179236000
206
207  You can extract the generation value from this object and then construct a
208  subsequent gsutil command like this:
209
210    gsutil -h x-goog-if-generation-match:1360432179236000 cp newfile \\
211        gs://bucket/object
212
213  This command requests Google Cloud Storage to attempt to upload newfile
214  but to fail the request if the generation of newfile that is live at the
215  time of the upload does not match that specified.
216
217  If the command you use updates object metadata, you will need to find the
218  current metageneration for an object. To do this, use the gsutil ls -a and
219  -l options. For example, the command:
220
221    gsutil ls -l -a gs://bucket/object
222
223  will output something like:
224
225      64  2013-02-12T19:59:13Z  gs://bucket/object#1360699153986000  metageneration=3
226    1521  2013-02-13T02:04:08Z  gs://bucket/object#1360721048778000  metageneration=2
227
228  Given this information, you could use the following command to request setting
229  the ACL on the older version of the object, such that the command will fail
230  unless that is the current version of the data+metadata:
231
232    gsutil -h x-goog-if-generation-match:1360699153986000 -h \\
233      x-goog-if-metageneration-match:3 acl set public-read \\
234      gs://bucket/object#1360699153986000
235
236  Without adding these headers, the update would simply overwrite the existing
237  ACL. Note that in contrast, the "gsutil acl ch" command uses these headers
238  automatically, because it performs a read-modify-write cycle in order to edit
239  ACLs.
240
241  If you want to experiment with how generations and metagenerations work, try
242  the following. First, upload an object; then use gsutil ls -l -a to list all
243  versions of the object, along with each version's metageneration; then re-
244  upload the object and repeat the gsutil ls -l -a. You should see two object
245  versions, each with metageneration=1. Now try setting the ACL, and rerun the
246  gsutil ls -l -a. You should see the most recent object generation now has
247  metageneration=2.
248
249
250<B>FOR MORE INFORMATION</B>
251  For more details on how to use versioning and preconditions, see
252  https://developers.google.com/storage/docs/object-versioning
253""")
254
255
256class CommandOptions(HelpProvider):
257  """Additional help about object versioning."""
258
259  # Help specification. See help_provider.py for documentation.
260  help_spec = HelpProvider.HelpSpec(
261      help_name='versions',
262      help_name_aliases=['concurrency', 'concurrency control'],
263      help_type='additional_help',
264      help_one_line_summary='Object Versioning and Concurrency Control',
265      help_text=_DETAILED_HELP_TEXT,
266      subcommand_help_text={},
267  )
268