Data Migration Process and Considerations
Making use of Google+ data take out can be more complicated than it first appears, particularly for a large archive with contributions from many people or organisations. You'll want to consider:
- What to archive
- What you want to use from it.
- How you plan to use the data.
- What portions of the archive you want to, can be, and you have permissions to make public.
- Where you plan to publish it, and what tools exist to import the selections you publish.
Remember: Data are liability. Information which is useful to you may also be dangerous, to yourself or others, if made generally available.
This page is under development as we explore and confirm procedures for Google Data Takeout. Consider information preliminary, as of 3 December 2018.
- 1 Cautions and disclaimers specific to Data Migration information
- 2 A basic data migration plan
- 3 Google+ Data Migration
- 4 Further considerations
- 5 Third Party Tools
- 6 References
Cautions and disclaimers specific to Data Migration information
Our intent is to provide useful and helpful information. This is a Wiki, it is generally editable, though it is also patrolled by editors and administrators. We cannot guarantee that the information at any time is either correct or non-malicious, though we will make reasonable attempts that it is both.
You should ensure that Web pages represented as Google properties are in fact Google properties when you navigate to them.
Do not enter your Google username and password into any non-Google site or domain unless that is specifically what you intend.
(As an example: if you are using a third-party tool to manage your Google site.)
In general, Google will provide single-use, or single-application passwords for such access. Make use of such tools if at all possible.
We are intentionally keeping use of URLs on this page to an absolute minimum, and are presenting naked rather than formatted URLS (e.g., https://google.com/ rather than Google) in most instances. Verify URLs referenced here.
(This is vaguely paranoid on our part, but we're aware of potential for malice, vandalism, and abuse, and wish to minimise risks.)
Though we're referencing PlexodusWiki, this is good general guidance and practice.
Now on to the migration....
A basic data migration plan
The steps and processes given here as of 15 October 2018 are preliminary and more an outline than a procedure. We hope to improve and expand on them over time, particularly up to the mid-January 2019 window at which we anticipate many final export decisions, and January - April 2019 window during which import and republishing, will occur. Improvements are welcomed, particularly as Google clarifies capabilities, documentation, and processes, and destination platforms provide specific tools or processes.
The general steps are:
- Identifying the information types you want to keep.
- Identifying the information types you can or should keep.
- Determining how you plan to use that information. Examples include posting to a blog, importing to another social media site, creating a personal archive, importing addresses and contacts, or creating a new forum or community site.
- Exporting the data from Google.
- Storing it until you can process it and have verified the final importing process.
- Unpacking, identifying, and selecting archive components.
- Converting extracted data to useful or usable formats.
- Cleaning up, converting, or updating the archives (may be done later). Includes updating or removing user/author references, URLs, and the like.
- Importing the data to the target or destination platform.
- Verifying import.
The general data types withing your Google Take Out archive will be:
- Your Google+ posts
- Your Google+ comments
- Others' comments on your Google+ posts.
- Your uploaded photos, videos, and other media.
- Contact information.
- Your profile description and metadata. Generally: your name, vitals, contact information, and "About" page descriptions and links.
- Miscellaneous other data.
This list may be inaccurate or incomplete.
Google+ Data Migration
Again the process is:
- Exporting Google data
- Storing Google+ Data
- Extracting, classify, and converting Google+ data.
- Import or publish to target platforms
Google provides for data export via its Google Data Takeout page, also referred to as Download your Data. This is part of the Data Liberation Front project within Google, all terms are used at various points.
- The Google Data Takeout URL is: https://takeout.google.com/settings/takeout
- PLEASE NOTE THAT YOU SHOULD VERIFY THAT THIS IS A GOOGLE DOMAIN AND YOU SHOULD VERIFY THIS LOCATION INDEPENDENTLY.
It is also possible to specify specific products to be archived on the URL, for example, Google+ Pages, Circles, Stream, and Plus Ones:
TODO: Add Profile, confirm elements selected.
Google also provide help on Google Data takeout. We feel that some of the guidance is not as useful as it could be, but you should consult it here:
We recommend reading through the rest of this page before creating your data archive, as there are considerations presented here. We will be providing further guidance in future of choices we feel are preferable.
Please note that we can make no guarantee of information provided here, and that all liability is disclaimed. Information is provided in good faith, though this page is open to general editing.
Exporting Google+ data
There may (and almost certainly will) be tools for utilising your Google Data Takeout automatically, potentially online through Google tools and/or services, such as Google Drive.
As an alternative it is possible to work with the archive directly using commonly available tools on a Linux, MacOS, or Windows desktop or laptop computer. This should be considered an advanced and technical process. If you are not comfortable using the bash or similar command shells, and scripting languages such as awk, Perl, Python, Ruby, etc., you are strongly encouraged to skip this section.
This is a brief sketch of the process Dredmorbius used several years ago, on a Linux system. It should be fleshed out into a script or program. It is possible (though perhaps not likely) that Google will themselves provide tools or systems for managing archives. This request has been made and others are encouraged to request it. Google-provided support should include tools to select and import data 'responsibly' to destination platforms.. Responsible importing means respecting privacy scope.
You will have the option of specifying JSON or HTML formats for data export. Google's JSON data is far more usable and useful than the HTML format, and is better supported by import tools.
The questions of want, can, may, and should refer to your preferences, abilities, permission, and risk exposure or resource limitations. Available export and import tools, copyright and other legal limitations or risks, privacy or appropriateness, and just general suitability, are among these considerations.
It may make sense to abandon some, much, or all of your data.
These are questions you and possibly your community must decide for yourselves.
The general process:
- Determine what data you want to, can, should, and may retain.
- Create your Google+ takeout. Select the JSON export format, NOT the HTML option.
- You probably want to include Posts, Comments, and Contacts, at a minimum, from Google+. You can include media such as photos, audio, video if you like or create a separate archive.
- Request the archive, and wait. Creation may take hours, possibly days. You will receive an email or notification when it is complete.
- If the archive fails or is incomplete, you will need to regenerate it. Reports are of many archival attempts failing. Google have been made aware of this, more feedback should help.
Storing Google+ data
Your archive represents both valuable and potentially harmful information. Loss, modification, or disclosure could all pose dangers.
- Store the archive in a safe place. This means somewhere where it can not be accidentally deleted or corrupted and where others cannot gain unauthorised access to it.
- Your G+ takeout _will almost certainly contain private data from or about you or others. Treat it like valuable and sensitive information. Again, there iss a saying in security circles: Data are liability.
- With some irony, Google Drive is probably one of the better options for storage: easy, accessible, durable, and reasonably safe. The information is already on Google, so you’re not changing the risk calculus too much.
- You may (and probably want to) download the archive to your own computer (laptop or desktop). Be aware that storing it there may be a risk for damage, loss, or breech.
- One or more offline copies, saved to USB storage, a local NAS or archive system, or CD, DVD, or Blu-Ray media, is another Good Practice. I recommend optical media. Keep in mind that at 750 MB, CDs have limited storage relative to archives which may be 1 - 100 GB or larger. Blu-Ray may be your best option here. Burn 2-3 sets and store securely in separate locations as protection against damage or loss.
That’s got you your archive.
Extracting, classifying, and converting Google+ data
jq(JSON query) utility can query, output, and process JSON archives. This is how you extract information from the archive.
- Your post and comment data will appear in Google-markup format. This includes _italic_ and *bold* markup, as well as internal Google profile references. I don’t recall their format, but references may not be directly translatable.
- A simple shell script (sed, awk, perl, python, ruby) can substitute HTML or Markdown tags within the content. I don’t think Pandoc directly recognises G+ markdown, but if it’s close enough to AsciiDoc, on which I think it’s based, that’s another option.
- Pandoc can create HTML, or any of dozens of other formats, from Markdown (and possibly directly from the G+ tags). So that’s how you get HTML.
You’ve still got the problems of:
- Identifying post context, date, author, thread, and privacy scope. Those are contained in the JSON formats, but it’s been years since I’ve looked at them.
- Determining which data you do and which you do NOT want to make public. Because you could be violating original privacy scope and intent.
Those two issues mean that you should NOT simply blindly import and publish your Google+ archive on some new site or platform. You will want to review content. Skipping any non-public material as a first option is a Very Good Practice.
That may still not be sufficient due to copyright or other considerations of possible criminal or civil liability, or simply annoying whomever the original content was written by or references. You will have to use judgement here. Again, this means that simply redirecting the entire archive is not a viable process.
For specifics on the data structure see Google Takeout Data Structure.
Import or publish to target platforms
Finally, import the data or publish it to your intended platforms.
Contact information should be in Vcard format, supported by most email and contact-management systems. The contacts may _not_ be particularly useful if they don't include email, phone, or other non-Google+ addresses.
Publishing to your Exodus destination platform(s) will vary by platform and available tools. It's been suggested to Google that they work with major providers to facilitate this, including respecting privacy settings where appropriate. You are encouraged to provide similar feedback.
Data migration will take considerable time if you plan on making the data public. Those who've ... had the pleasure of going through related processes a few times know that it can take months, for one indiviudual at the level of a few hundreds of items. You may find it simpler to just abandon a large archive of > 1,000 articles or so.
It is not necessary to convert all content in advance, so the process can be completed post-migration.
Assessing the scope of this task should be part of your pre-migration planning phase.
Third Party Tools
Independent tools created by third parties are beginning to appear, be announced, and/or begin development. Use of these entails risk of exposure to your Google account, your data, and third parties' data, please be aware of this.
As of 30 November 2018, status of all these tools should be considered beta/experimental/in development unless specifically otherwise noted.
- Tim Berners-Lee / Solid: solid-takeout-importer.
- Filip H.S. "FiXato" Slagter: Plexodus-Tools.
- Friends+Me: Google+ Exporter.
- Spencer Salyer's Communities-based exporter, gplus-archiver.
A list of tools that would be useful includes:
- Takeout.G+Streams.Posts - > Blogger
- Takeout.G+Streams.Posts - > Atom
- Takeout.G+Streams.Posts - > Wordpress
- Takeout.G+Streams.Posts - > Reddit
- Takeout.G+Streams.Posts - > Other platforms that have an import or post API
- Takeout.G+Streams.Posts - > Static HTML as a better alternative to that provided by Google.
- Takeout.G+Streams.Posts.html - > Extract <body> section to files.
- Takeout.G+Streams.Circles - > Enhanced VCard/CSV with additional data via G+API.people.get
- Takeout - fix the filenames to deal with UTF-8 characters
Suggested by Julian Bond.
- Google Account Help: Download your data. Official Google Data Takeout documentation.
- Bernhard Suter's multi-part Google+ Migration series. This is very strongly recommended.