During the lifespan of a graph database application, the applications themselves tend to only have basic requirements, namely a functioning W3C standard SPARQL endpoint. However, as graph databases become embedded in critical business applications, both businesses and operations require much more. Critical business infrastructure is required not only to function, but also to be highly available, secure, scalable, and cost-effective. These requirements are driving the desire to move from on-premises or self-hosted solutions to a fully managed graph database solution such as Amazon Neptune.
Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run business-critical graph database applications. Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latency. Neptune is designed to be highly available, with read replicas, point-in-time recovery, continuous backup to Amazon Simple Storage Service (Amazon S3), and replication across Availability Zones. Neptune is secure with support for AWS Identity and Access Management (IAM) authentication, HTTPS-encrypted client connections, and encryption at rest. Neptune also provides a variety of instance types, including low-cost instances targeted at development and testing, which provide a predictable, low-cost, managed infrastructure.
When choosing to migrate from current on-premises or self-hosted graph database solutions to Neptune, whats the best way to perform this migration?
This post demonstrates how to migrate from the open-source RDF triplestore Blazegraph to Neptune by completing the following steps:
This post also examines the differences you need to be aware of while migrating between the two databases. Although this post is targeted at those migrating from Blazegraph, the approach is generally applicable for migration from other RDF triplestore databases.
Before covering the migration process, lets examine the fundamental building blocks of the architecture used throughout this post. This architecture consists of four main components:
The following diagram summarizes these resources and illustrates the solution architecture.
Although its possible to construct the required AWS infrastructure manually through the AWS Management Console or CLI, this post uses a CloudFormation template to create the majority of the required infrastructure.
The process of exporting data from Blazegraph involves three steps:
The first step is exporting the data out of Blazegraph in a format thats compatible with the Neptune bulk loader. For more information about supported formats, see RDF Load Data Formats.
Depending on how the data is stored in Blazegraph (triples or quads) and how many named graphs are in use, Blazegraph may require that you perform the export process multiple times and generate multiple data files. If the data is stored as triples, you need to run one export for each named graph. If the data is stored as quads, you may choose to either export data in N-Quads format or export each named graph in a triples format. For this post, you export a single namespace as N-Quads, but you can repeat the process for additional namespaces or desired export formats.
There are two recommended methods for exporting data from Blazegraph. Which one you choose depends if the application needs to be online and available during the migration.
If it must be online, we recommend using SPARQL CONSTRUCT queries. With this option, you need to install, configure, and run a Blazegraph instance with an accessible SPARQL endpoint.
If the application is not required to be online, we recommend using the BlazeGraph Export utility. With this option, you must download Blazegraph, and the data file and configuration files need to be accessible, but the server doesnt need to be running.
SPARQL CONSTRUCT queries are a feature of SPARQL that returns an RDF graph matching the query template specified. For this use case, you use them to export your data one namespace at a time using the following query:
Although a variety of RDF tools to export this data exist, the easiest way to run this query is by using the REST API endpoint provided by Blazegraph. The following script demonstrates how to use a Python (3.6+) script to export data as N-Quads:
If the data is stored as triples, you need to change the Accept header parameter to export data in an appropriate format (N-Triples, RDF/XML, or Turtle) using the values specified on the GitHub repo.
Although performing this export using the REST API is one way to export your data, it requires a running server and sufficient server resources to process this additional query overhead. This isnt always possible, so how do you perform an export on an offline copy of the data?
For those use cases, you can use the Blazegraph Export utility to get an export of the data.
Blazegraph contains a utility method to export data: the ExportKB class. This utility facilitates exporting data from Blazegraph, but unlike the previous method, the server must be offline while the export is running. This makes it the ideal method to use when you can take the application offline during migration, or the migration can occur from a backup of the data.
You run the utility via a Java command line from a machine that has Blazegraph installed but not running. The easiest way to run this command is to download the latest blazegraph.jar release located on GitHub. Running this command requires several parameters:
For example, if you have the Blazegraph journal file and properties files, export data as N-Quads with the following code:
Upon successful completion, you see a message similar to the following code:
No matter which option you choose, you can successfully export your data from Blazegraph in a Neptune-compatible format. You can now move on to migrating these data files to Amazon S3 to prepare for bulk load.
With your data exported from Blazegraph, the next step is to create a new S3 bucket. This bucket holds the data files exported from Blazegraph for the Neptune bulk loader to use. Because the Neptune bulk loader requires low latency access to the data during load, this bucket needs to be located in the same Region as the target Neptune instance. Other than the location of the S3 bucket, no specific additional configuration is required.
You can create a bucket in a variety of ways:
You use the newly created S3 bucket location to bulk load the data into Neptune.
The next step is to upload your data files from your export location to this S3 bucket. As with the bucket creation, you can do this in the following ways:
Although this example code only loads a single file, if you exported multiple files, you need to upload each file to this S3 bucket.
After loading all the files in your S3 bucket, youre ready for the final task of the migration: importing data into Neptune.
Because you exported your data from Blazegraph and made it available via Amazon S3, your next step is to import the data into Neptune. Neptune has a bulk loader that loads data faster and with less overhead than performing load operations using SPARQL. The bulk loader process is started by a call to the loader endpoint API to load data stored in the identified S3 bucket into Neptune. This loading process happens in three steps:
The following diagram illustrates how we will perform these steps in our AWS infrastructure.
You begin the import process by making a request into Neptune to start the bulk load. Although this is possible via a direct call to the loader REST endpoint, you must have access to the private VPC in which the target Neptune instance runs. You could set up a bastion host, SSH into that machine, and run the cURL command, but Neptune Workbench is an easier method.
Neptune Workbench is a preconfigured Jupyter notebook which is an Amazon SageMaker notebook, with several Neptune-specific notebook magics installed. These magics simplify common Neptune interactions, such as checking the cluster status, running SPARQL and Gremlin traversals, and running a bulk loading operation.
To start the bulk load process use the %load magic, which provides an interface to run the Neptune loader API.
The result contains the status of the request. Bulk loads are long-running processes; this response doesnt mean that the load is complete, only that it has begun. This status updates periodically to provide the most recent loading job status until the job is complete. When loading is complete, you receive notification of the job status.
With your loading job having completed successfully your data is loaded into Neptune and youre ready to move on to the final step of the import process: validating the data migration.
As with any data migration, you can validate that the data migrated correctly in several ways. These tend to be specific to the data youre migrating, the confidence level required for the migration, and what is most important in the particular domain. In most cases, these validation efforts involve running queries that compare the before and after data values.
To make this easier, the Neptune Workbench notebook has a magic (%%sparql) that simplifies running SPARQL queries against your Neptune cluster. See the following code.
This Neptune-specific magic runs SPARQL queries against the associated Neptune instance and returns the results in tabular form.
The last thing you need to investigate is any application changes that you may need to make due to the differences between Blazegraph and Neptune. Luckily, both Blazegraph and Neptune are compatible with SPARQL 1.1, meaning that you can change your application configuration to point to your new Neptune SPARQL endpoint, and everything should work.
However, as with any database migration, several differences exist between the implementations of Blazegraph and Neptune that may impact your ability to migrate. The following major differences either require changes to queries, the application architecture, or both, as part of the migration process:
However, Neptune offers several additional features that Blazegraph doesnt offer:
This post examined the process for migrating from an on-premises or self-hosted Blazegraph instance to a fully managed Neptune database. A migration to Neptune not only satisfies the requirements of many applications from a development viewpoint, it also satisfies the operational business requirements of business-critical applications. Additionally, this migration unlocks many advantages, including cost-optimization, better integration with native cloud tools, and lowering operational burden.
Its our hope that this post provides you with the confidence to begin your migration. If you have any questions, comments, or other feedback, were always available through your Amazon account manager or via the Amazon Neptune Discussion Forums.
Dave Bechberger is a Sr. Graph Architect with the Amazon Neptune team. He used his years of experience working with customers to build graph database-backed applications as inspiration to co-author Graph Databases in Action by Manning.
Read the rest here:
Moving to the cloud: Migrating Blazegraph to Amazon Neptune - idk.dev
- Setting up a Virtual Server on Ninefold - Video [Last Updated On: February 26th, 2012] [Originally Added On: February 26th, 2012]
- ScaleXtreme Automates Cloud-Based Patch Management For Virtual, Physical Servers [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Secure Cloud Computing Software manages IT resources. [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Dell unveils new servers, says not a PC company [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Wyse to Launch Client Infrastructure Management Software as a Service, Enabling Simple and Secure Management of Any ... [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- As the App Culture Builds, Dell Accelerates its Shift to Services with New Line of Servers, Flash Capabilities [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Terraria - Cloud In A Ballon - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- Ethernet Alliance Interoperability Demo Showcases High-Speed Cloud Connections [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- RSA and Zscaler Teaming Up to Deliver Trusted Access for Cloud Computing [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- [NEC Report from MWC2012] NEC-Cloud-Marketplace - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- IBM SmartCloud Virtualized Server Recovery - Video [Last Updated On: February 28th, 2012] [Originally Added On: February 28th, 2012]
- BeyondTrust Launches PowerBroker Servers Windows Edition [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- Ericsson joins OpenStack cloud infrastructure community [Last Updated On: February 29th, 2012] [Originally Added On: February 29th, 2012]
- ScaleXtreme Cloud-Based Patch Management Open for New Customers [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- RootAxcess - Getting Started - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- How to Create a Terraria Server 1.1.2 (All Links Provided) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Dell #1 in Hyperscale Servers (Steve Cumings) - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- Managing SAP on Power Systems with Cloud technologies delivers superior IT economics - Video [Last Updated On: March 1st, 2012] [Originally Added On: March 1st, 2012]
- AMD Acquires Cloud Server Maker SeaMicro for $334M USD [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Web Host 1&1 Provides More Flexibility with Dynamic Cloud Server [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- Leap Day brings down Microsoft's Azure cloud service [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- RightMobileApps White Label Program - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- bzst server ban #2 - Video [Last Updated On: March 3rd, 2012] [Originally Added On: March 3rd, 2012]
- “Cloud storage served from an array would cost $2 a gigabyte” [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- More Flexibility with the 1&1 Dynamic Cloud Server [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Hub’s future jobs may be in cloud [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cloud computing growing jobs, says Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- TurnKey Internet Launches WebMatrix, a New Application in Partnership with Microsoft [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Cebit 2012: SAP Cloud Computing Strategy - Introduction - Video [Last Updated On: March 6th, 2012] [Originally Added On: March 6th, 2012]
- Dome9 Security Launches Industry's First Free Cloud Security for Unlimited Number of Servers [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Servers Are Refreshed With Intel's New E5 Chips [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Samsung's AllShare Play pushes pictures from phone to cloud and TV [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Google drops the price of Cloud Storage service [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Intel Server Technology: Powering the Cloud to Handle 15 Billion Connected Devices [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Swisscom IT Services Launches Cloud Storage Services Powered by CTERA Networks [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- KineticD Releases Suite of Cloud Backup Offerings for SMBs [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- First Look: Samsung Allshare Play - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- Bill The Server Guy Introduces the New Intel XEON e5-2600 (Romley) Server CPU's - Video [Last Updated On: March 7th, 2012] [Originally Added On: March 7th, 2012]
- New Cisco servers have Intel Xeon E5 inside [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Cisco rolls out UCS servers with Intel Xeon E5 chips [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- From scooters to servers: The best of Launch, Day One [Last Updated On: March 8th, 2012] [Originally Added On: March 8th, 2012]
- Computer Basics: What is the Cloud? - Video [Last Updated On: March 9th, 2012] [Originally Added On: March 9th, 2012]
- Could the digital 'cloud' crash? [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Dome9 Security Launches Free Cloud Security For Unlimited Number Of Servers [Last Updated On: March 10th, 2012] [Originally Added On: March 10th, 2012]
- Cloud computing 'made in Germany' stirs debate at CeBIT [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- New Key Technology Simplifies Data Encryption in the Cloud [Last Updated On: March 11th, 2012] [Originally Added On: March 11th, 2012]
- Can a private cloud drive energy efficiency in datacentres? [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Porticor's new key technology simplifies data encryption in the cloud [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Borders + Gratehouse Adds Three New Clients in Cloud Sector [Last Updated On: March 12th, 2012] [Originally Added On: March 12th, 2012]
- Dell to invest $700 mn in R&D, unveils 12G servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Defiant Kaleidescape To Keep Shipping Movie Servers [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Data Centre Transformation Master Class 3: Cloud Architecture - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 1/3 - Video #310 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Cloud Computing - 28/02/12 - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- SYS-CON.tv @ 9th Cloud Expo | Nand Mulchandani, CEO and Co-Founder of ScaleXtreme - Video [Last Updated On: March 13th, 2012] [Originally Added On: March 13th, 2012]
- Oni Launches New Cloud Services for Enterprises Using CA Technologies Cloud Platform [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Advanced Technology - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- SmartStyle Infrastructure - Video [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- The Hidden Risk of a Meltdown in the Cloud [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- FireHost Launches Secure Cloud Data Center in Phoenix, Arizona [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Panda Security Launches New Channel Partner Recruitment Campaign: "Security to the Power of the Cloud" [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- NetSTAR, Inc. Announces Safe and Secure Web Browsers for iPhones, iPads, and Android Devices [Last Updated On: March 14th, 2012] [Originally Added On: March 14th, 2012]
- Amazon Cloud Powered by 'Almost 500,000 Servers' [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- NetSTAR Announces Secure Web Browsers For iPhones, iPads, And Android Devices [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Be Prepared For When the Cloud Really Fails [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Dr. Cloud explains dinCloud's hosted virtual server solution - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- New estimate pegs Amazon's cloud at nearly half a million servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Amazon’s Web Services Uses 450K Servers [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Saving File On Internet - Cloud Computing - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- DotNetNuke Tutorial - Great hosting tool - PowerDNN Control Suite - part 2/3 - Video #311 - Video [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Linux servers keep growing, Windows & Unix keep shrinking [Last Updated On: March 15th, 2012] [Originally Added On: March 15th, 2012]
- Cloud Desktop from Compute Blocks - Video [Last Updated On: March 16th, 2012] [Originally Added On: March 16th, 2012]
- Amazon EC2 cloud is made up of almost half-a-million Linux servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- HP trots out new line of “self-sufficient” servers [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Cloud Web Hosting Reviews - Australian Cloud Hosting Providers - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Using Porticor to protect data in a snapshot scenario in AWS - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- CDW - Charles Barkley - New Office - Video [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Nearly a Half Million Servers May Power Amazon Cloud [Last Updated On: March 17th, 2012] [Originally Added On: March 17th, 2012]
- Morphlabs CEO Winston Damarillo talks about their mCloud Rack - Video [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]
- AMD reaches for the cloud with new server chips [Last Updated On: March 20th, 2012] [Originally Added On: March 20th, 2012]