VMware Cloud on AWS VPC

VMware Cloud on AWS VPC



Public Cloud Hyperscalers Comparison

Public Cloud Hyperscalers Comparison

The are only three main global public cloud vendors AWS, Azure and Google Cloud. These three all have very interesting competitive advantages for global Enterprise Market; Not pokemon Go , I recall a Google Cloud SE talking about Pokemon go in a Enterprise Meetup. Shssss.


  • AWS
    • Advantage
      • First to global market, absolute dominate leader in Public cloud, with the most advanced feature rich platform, at least 10 years ahead of Azure and Google Cloud. The only options if you are building a global scale app.
    • Disadvantage
      • Incredibly complex and expensive to run non-aws optimised workloads and design.
      • Lack of Enterprise experience, Agile, DevOps is just a nice buzz word used in corporate world the reality is very different.
      • Most Enterprise workloads will require complete refactoring for migration, but VMware integration and NetApp CloudVolumes will make it allot easier for Enterprise Workload migration.
      • Lock in Architecture, once you build a AWS native app, it will be very difficult to migrate out.
      • Not all services meet ‘devils-in-details’ advanced and enterprise features. AWS WAF, it is a version of ModSecurity Opensource version, but very difficult to customise and can not compete with a F5 WAF features.
      • AWS people are expensive. (like me)
  • Azure
    • Advantage
      • Every single Enterprise Customer already uses most Microsoft products ; Microsoft Office 365, Microsoft Active Directory, Microsoft Windows Operating Systems, Microsoft Storage Server, Microsoft Azure Stack, Microsoft Azure AD SSO. (These technologies provide the stickiness for Azure.)
      • Microsoft Windows Operating System, Microsoft Active Directory, and Microsoft Office 365 are used by almost every corporate customer in the world. As customer transition from on-premise to cloud and SaaS, they will move workloads to Office 365 and Azure AD, and then setup a tenant on Azure making it a very easy transition.
      • Microsoft also restricts some applications and Operating Systems, via licensing restrictions for other shared compute platforms other than its own Azure platform. Eg. Microsoft RDS and Windows 10 are only allowed on Azure. There are many other complex licensing issues that you will only figure out while reading all the licensing legal items.  (I have a number of articles discussing this on this blog.)
      • Microsoft is also enabling, on-premis Azure stack that will make it easy to deploy and transition from on-premis to Azure, including its own Microsoft Storage Server.
    • Disadvantage
      • Microsoft console is not as feature rich and the available features are rolled out at beta and can cause allot of headache, if you are not experienced enough to understand.
      • Microsoft technology takes a great deal of Expertise to maintain
  • Google Cloud
    • Advantage
      • Google PWA, Google Chrome, Google DaRT, Google Firebase
      • Google Services are running on massive infrastructure globally and just like Amazon, their primary customer is themselfs.
      • They are taking a different approach to gaining market share, As google provide the most widely used browser, they are pushing PWA for development. The whole Google Cloud platform is very much accessible via a developers IDE. Its very easy to start to create a multiplatform application using a Google framework such as AngularData and run up services using Google Firebase.. Connecting the developers IDE directly to the Google Cloud platform makes it a very easy options for DEVOPS and develop MVPs.
    • Disadvantage
      • Late to the game, they are not moving as fast as AWS or Azure in terms of release of features.
      • Google is Search, Google Advertising company moving to Cloud/DC infrastructure applications, etc in the enterprise is a big giant leap. They will need to hire Enterprise Presales.


There is still plenty of years left in traditional data centre technologies and new emerging scale-out and management platforms. You can easily design a server infrastructure with the latest tech that can be 1/10 of the cost of AWS and you can then sweat that asset for 10+ years. I worked on IBM non-stop servers and they are still going after 20+ years. That is pretty good ROI for apps that don’t need to scale-out.

Enterprise Architecture for Digital Transformation is required, a CIO saying everything needs to go AWS is getting paid too much money. anyone can say that.. You need a proper assessment of your business, future strategy and current workloads.

Australian Government – Digital Transformation Strategy

Australian Government – Digital Transformation Strategy


  • https://www.dta.gov.au/what-we-do/policies-and-programs/secure-cloud/?lipi=urn%3Ali%3Apage%3Ad_flagship3_pulse_read%3BWmMXjzgNTV2ysnBsB%2BS3DQ%3D%3D
  • https://www.dta.gov.au/files/cloud-strategy/secure-cloud-strategy.pdf
  • Principle 1: Make risk-based decisions when applying cloud security
  • Principle 2: Design services for the cloud
  • Principle 3: Use public cloud services as the default
  • Principle 4: Use as much of the cloud as possible
  • Principle 5: Avoid customisation and use services ‘as they come’
  • Principle 6: Take full advantage of cloud automation practices
  • Principle 7: Monitor the health and usage of cloud services in real time
  • Initiative 1: Agencies must develop their own cloud strategy
  • Initiative 2: Implement a layered certification model
  • Initiative 3: Redevelop the Cloud Services Panel to align with the procurement recommendations for a new procurement pathway that better supports cloud commodity purchases
  • Initiative 4: Create a dashboard to show service status for adoption, compliance status and services panel status and pricing
  • Initiative 5: Create and publish cloud service qualities baseline and assessment capability
  • Initiative 6: Build a cloud responsibility model supported by a cloud contracts capability
  • Initiative 7: Establish a whole-of-government cloud knowledge exchange
  • Initiative 8: Expand the Building Digital Capability program to include cloud skills
  • Myth 1: The Cloud is not as secure as on premise services
  • Myth 2: Privacy reasons mean government data cannot reside offshore.
  • “Generally, no. The Privacy Act does not prevent an Australian Privacy Principle (APP) entity from engaging a cloud service provider to store or process personal information overseas. The APP entity must comply with the APPs in sending personal information to the overseas cloud service provider, just as they need to for any other overseas outsourcing arrangement. In addition, the Office of the Australian Information Commissioner’s Guide to securing personal information: ‘Reasonable steps’ to protect personal information discusses security considerations that may be relevant under APP 11 when using cloud computing.” https://www.oaic.gov.au/agencies-and-organisations/agency-resources/privacy-agency-resource-4-sending-personalinformation-overseas Additionally, APP 8 provides the criteria for cross-border disclosure of personal information, which ensures the right practices for data residing off-shore are in place. Our Australian privacy frameworks establish the accountabilities to ensure the appropriate privacy and security controls are in place to maintain confidence in our personal information in the cloud.

    Myth 3: Information in the cloud is not managed properly and does not comply with record keeping obligations


Cloud Migration Example

Cloud Migration Example

This is by far one of the best Cloud Migration Examples, Keeping a copy here for future reference. Using ZeroTier and Consul. You could use VMware NSX and velocloud.com for the same function.



About 6 months ago (in a galaxy pretty close to our office) …

Our old hosting provider was having network issues… again. There had been a network split around 3:20 AM, which had caused a few of our worker servers to become disconnected from the rest of our network. The background jobs on those workers kept trying to reach our other services until their timeout was reached, and they gave up.

This had already been the second incident in that month. Earlier, a few of our servers had been rebooted without warning. We were lucky that these servers were part of a cluster that could handle suddenly failing workers gracefully. We had taken care that rebooted servers would start up all their services in the right order and would rejoin the cluster without manual intervention.

However, if we would have been unlucky, and e.g. our main database server would have been restarted without warning, then we would have had some downtime and, potentially, would have had to manually fail over to our secondary database server.

We kept joking about how the flakiness of our current hosting provider was a great “Chaos Monkey”-like service which forced us to make sure that we had proper retry-policies and service start-up sequences in place everywhere.

But there were also other issues: booting up new machines was a slow and manual process, with few possibilities for automation. The small maximum machine size also started to become an inconvenience, and, lastly, they only had datacenters in the Netherlands, while we kept growing internationally.

It was clear that we needed to do something about the situation.

Which cloud to go to?

Our requirements for a new hosting provider made it pretty clear that we would have to move to one of the three big cloud providers if we wanted to fulfill all of them. One of the important things for us was an improved DevOps experience that would allow us to move faster. We needed to be able to spin up new boxes with the right image in seconds. We needed a fully private network that we could configure dynamically. We needed to be flexible in both storage and compute options and be able to scale both of them up and down as necessary. Additional hosted services (e.g. log aggregation and alerting) would also be nice to have. But, most importantly, we needed to be able to control and automate all of this with a nice API.

We had already been using Google Cloud Storage (GCS) in the past and were very content with it. The main reason for us to go with GCS had been the possibility to configure it to be strongly consistent, which made things easier for us. Therefore, we had a slight bias towards Google Cloud Platform (GCP) from the start but still decided to evaluate AWS and Azure for our use case.

Azure fell out pretty quickly. It just seemed too rough around the edges and some of us had used it for other projects and could report that they had cut their fingers on one thing or another. With AWS, the case was different, since it has everything and the kitchen sink. A technical problem was the lack of true strong consistency for S3. While it does provide read-after-write consistency for new files, it only provides eventual consistency for overwrite PUTs and for DELETEs.

Another issue was the price-performance ratio: for our workload, it looked like AWS was going to be at least two times more expensive as GCP for the same performance. While there are a lot of tricks one can use to get a lower AWS bill, they are all rather complex and either require you to get into the business of speculating on spot instances or to commit for a long time to specific instances, which are both things we would rather avoid doing. With GCP, the pricing is very straightforward: you pay a certain base price per instance per month, and you get a discount on that price of up to 80% for sustained use. In practice: If you run an instance 24/7, you end up paying less than half of the “regular” price.

Given that Google also offers great networking options, has a well-designed API with an accompanying command-line client, and has datacenters all over the world, the choice was simple: we would be moving to GCP.

How do we get there?

After the decision had been taken, the next task was to figure out how we would move all of our data and all of our services to GCP. This would be a major undertaking and require careful preparation, testing, and execution. It was clear that the only viable path would be a gradual migration of one service after another. The “big bang” migration is something we had stopped doing a long time ago after realizing that, even with only a handful of services and a lot of preparation and testing, this is very hard to get right. Additionally, there is often no easy path to rollback after you pulled the trigger, leading to frantic fire-fighting and stressed engineers.

The requirements for the migration were thus as follows:

  • as little downtime as possible
  • possibility to gradually move one service after the other
  • testing of individual services as well as integration tests of several services
  • clear rollback path for each service
  • continuous data migration

This daunting list had a few implications:

  • We would need to be able to securely communicate between the old and the new datacenter (let’s call them dc1 and dc2)
  • The latency and the throughput between the two would need to be good enough that we could serve frontend requests across datacenters
  • Internal DNS names needed to be resolved between datacenters (and there could be no DNS name clashes)
  • And, we would have to come up with a way to continuously sync data between the two until we were ready to pull the switch

A plan emerges

After mulling this over for a bit, we started to have a good idea how to go about it. One of the key ingredients would be a VPN that would span both datacenters. The other would be proper separation of services on the DNS level.

On the VPN side, we wanted to have one big logical network where every service could talk to every other service as if they were in the same datacenter. Additionally, it would be nice if we wouldn’t have to route all traffic through the VPN. If two servers were in the same datacenter, it would be better if they could talk to each other directly through the local network.

Given that we don’t usually spend all day configuring networks, we had to do some research first to find the best solution. We talked to another startup that was using a similar setup, and they were relying on heavy-duty networking hardware that had built-in VPN capabilities. While this was working really well for them, it was not really an option for us. We had always been renting all of our hardware and had no intention of changing that. We would have to go with a software solution.

The first thing we looked at was OpenVPN. It’s the most popular open-source VPN solution, and it has been around for a long time. We had even been using it for our office network for a while and had some experience with it. However, our experience had not been particularly great. It had been a pain to configure and getting new machines online was more of a hassle than it should have been. There were also some connectivity issues sometimes where we would have to restart the service to fix the problem.

We started looking for alternatives and quickly stumbled upon zerotier.com, a small startup that had set out to make using VPNs user-friendly and simple. We took their software for a test ride and came away impressed: it literally took 10 minutes to connect two machines, and it did not require us to juggle certificates ourselves. In fact, the software is open-source and they do provide signed DEB and RPM packages on their site.

The best part of ZeroTier, however, is its peer-to-peer architecture: nodes in the network talk directly to each other instead of through some central server and we measured very low latencies and high throughput due to it. This was another concern that we had had with OpenVPN, since the gateway server could have become a bottleneck between the two datacenters. The only caveat about ZT is that it requires a central server for the initial connection to a new server, all traffic after that initial handshake is peer-to-peer.

With the VPN in-place, we needed to take care of the DNS and service discovery piece next. Fortunately, this one was easy: we had been using Hashicorp’s Consul pretty much from the beginning and knew that it had multi-datacenter capabilities. We only needed to find out how to combine the two.

The dream team: Consul and ZeroTier

Getting ZeroTier up and running was really easy:

  • First install the zerotier-one service via apt on each server (automate this with your tool of choice).
  • Then, issue sudo zerotier-cli join the_network_id once to join the VPN.
  • Finally, you have to authorize each server in the ZT web interface by checking a box (this step can also be automated via their API, but this was not worth the effort for us).

This will create a new virtual network interface on each server:

robert@example ~ % ip addr
3: zt0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether ff:11:ff:11:ff:11 brd ff:ff:ff:ff:ff:ff
    inet brd scope global zt0

The IP address will be assigned automatically a few seconds after authorizing the server. Each server then has two network interfaces, the default one (e.g. ens4) and the ZT one, called zt0. They will be in different subnets, e.g. 10.132.x.x and 10.144.x.x, where the first one is the private network inside of the Google datacenter and the second is the virtual private network created by ZT, which spans across both dc1 and dc2. At this point, each server in dc1 is able to ping each server in dc2 on their ZT interface.

It would be possible to run all traffic over the ZT network, but, for two servers that are anyway in the same datacenter, this would be a bit wasteful due to the (small) overhead introduced by ZT. We, therefore, looked for a way to advertise a different IP address depending on who was asking. For cross-datacenter DNS requests, we wanted to resolve to the ZT IP address, and, for in-datacenter DNS requests, we wanted to resolve to the local network interface.

The good news here is that Consul supports this out-of-the-box! Consul works with JSON configuration files for each node and service. An example of the config for a node is the following:

robert@example:/etc/consul$ cat 00-config.json
  "dns_config": {
    "allow_stale": true,
    "max_stale": "10m",
    "service_ttl": {
      "*": "5s"
  "server": false,
  "bind_addr": "",
  "datacenter": "dc2",
  "advertise_addr": "",
  "advertise_addr_wan": "",
  "translate_wan_addrs": true

Consul relies on the datacenter to be set correctly if it is used for both LAN and WAN requests. The other important flags here are:

  • advertise_addr the address to advertise over LAN (the local one in our case)
  • advertise_addr_wan the address to advertise over WAN (ZT in our case)
  • translate_wan_addrs enable to return the WAN address for nodes in a remote datacenter
  • bind_addr make sure this is (which is the default) so that consul listens on all interfaces

After applying this setup to all nodes in each datacenter, you should now be able to reach each node and service across datacenters. You can test this by e.g. doing dig node_name.node.dc1.consul once from a machine in dc1 and once from a machine in dc2, and they should then respond with the local and with the ZT addresses respectively.

Given this setup, it is then possible to switch from a service in one datacenter to the same service in another datacenter simply by changing its DNS configuration.

Issues we ran into

As with all big projects like this, we ran into a few issues of course:

  • We encountered a Linux kernel bug that prevented ZT from working. It was easily fixed by upgrading to the latest kernel.
  • We are using Hashicorp’s Vault for secret management. See our other blogpost for a more in-depth explanation of how we use it. In order to make vault work nicely with ZT we needed to set its redirect_addr to the consul hostname of the server it is running on, e.g. redirect_addr = "http://the_hostname.node.dc1.consul:8501". Vault advertises its redirect address in its Consul service definition by default. And this defaults to the private IP in the datacenter it was running in. Setting the redirect_addr to the Consul hostname ensures that consul resolves to the right address. Debugging this issue was quite the journey and required diving into the source of both Consul and Vault.
  • Another issue we ran into was that Dnsmasq is not installed by default on GCE Ubuntu images. We rely on Dnsmasq to relay *.consul domain names to Consul. It can easily be installed via apt of course.

Moving the data

While a lot of our services are stateless and could therefore easily be moved, we naturally also need to store our data somewhere and, therefore, had to come up with a plan to migrate it to its new home.

Our main datastores are Postgres, HDFS, and Redis. Each one of these needed a different approach in order to minimize any potential downtime. The migration path for Postgres was straightforward: Using pg_basebackup, we could simply add another hot-standby server in the new datacenter, which would continously sync the data from the master until we were ready to pull the switch. Before the critical moment we turned on synchronous_commit to make sure that there was no replication lag and then failed over using the trigger file mechanism that Postgres provides. This technique is also convenient if you need to upgrade your DB server, or if you need to do some maintenance, e.g. apply security updates and reboot.

For HDFS the approach was different: Due to the nature of our application, we refresh all data on it at least every 24 hours. This made it possible to simply upload all of the data to two clusters in parallel and to keep them synced as well. Having the data on the new and the old cluster allowed us to run a number of integration tests that ensured that the old and the new system would return the same results. For a while, we would submit the same jobs to both clusters and compare the results. The result from the new cluster would be discarded, but, if there was a difference, we would send an alert that would allow us to investigate the difference and fix the problem. This kind of “A/B-testing” was an invaluable help that helped ironing out any unforeseen issues before switching over in production.

We use Redis mainly for background jobs, and we have support for pausing jobs temporarily in Jobmachine, our job scheduling system. This made the Redis move easy: We could pause jobs, sync the Redis data to disk, scp the data over to the new server, run a few integrity tests, update DNS, and then resume processing jobs.

The key in migrating our data was again to do each service individually, validate the data, test the services relying on it, and then switching over once we were sure everything was working correctly.


The issues and limitations of our old hosting provider made it necessary to look for an alternative. It was important for us that we could move all of our services and data gradually and could test and validate each step of the migration. We therefore chose to create a VPN that would span both of our datacenters using ZeroTier. In combination with Consul, this allowed us to have two instances of each service, which we could easily switch between using only a DNS update. For the data migration we made sure to duplicate all data continuously until we were sure everything was working as intended. If you are looking for an easy way to migrate from one datacenter to another, then we can highly recommend looking into both Consul and ZeroTier.

Gartner – Five Ways to Migrate Applications to the Cloud (just words thou)

Rehost, i.e. redeploy applications to a different hardware environment and change the application’s infrastructure configuration. Rehosting an application without making changes to its architecture can provide a fast cloud migration solution. However, the primary advantage of IaaS, that – teams can migrate systems quickly, without modifying their architecture – can be its primary disadvantage as benefits from the cloud characteristics of the infrastructure, such as scalability, will be missed.

Refactor, i.e. run applications on a cloud provider’s infrastructure. The primary advantage is blending familiarity with innovation as “backward-compatible” PaaS means developers can reuse languages, frameworks, and containers they have invested in, thus leveraging code the organization considers strategic. Disadvantages include missing capabilities, transitive risk, and framework lock-in. At this early stage in the PaaS market, some of the capabilities developers depend on with existing platforms can be missing from PaaS offerings.

Revise, i.e. modify or extend the existing code base to support legacy modernization requirements, then use rehost or refactor options to deploy to cloud. This option allows organizations to optimize the application to leverage the cloud characteristics of providers’ infrastructure. The downside is that kicking off a (possibly major) development project will require upfront expenses to mobilize a development team. Depending on the scale of the revision, revise is the option likely to take most time to deliver its capabilities.

Rebuild, i.e. Rebuild the solution on PaaS, discard code for an existing application and re-architect the application. Although rebuilding requires losing the familiarity of existing code and frameworks, the advantage of rebuilding an application is access to innovative features in the provider’s platform. They improve developer productivity, such as tools that allow application templates and data models to be customized, metadata-driven engines, and communities that supply pre-built components. However, lock-in is the primary disadvantage so if the provider makes a pricing or technical change that the consumer cannot accept, breaches service level agreements (SLAs), or fails, the consumer is forced to switch, potentially abandoning some or all of its application assets.

Replace, i.e. discard an existing application (or set of applications) and use commercial software delivered as a service. This option avoids investment in mobilizing a development team when requirements for a business function change quickly. Disadvantages can include inconsistent data semantics, data access issues, and vendor lock-in.


The 10 Laws of Cloudonomics

The 10 Laws of Cloudonomics

In 2008, Joe Weinman, then Strategic Solutions Sales VP for AT&T Global Business Services, created the 10 Laws of Cloudonomics22 that still, after two and a half years, are the foundation for the economics of Cloud Computing. We’ve reproduced an abridged version of the Cloudonomics laws below.

Cloudonomics Law #1: Utility services cost less even though they cost more. Although utilities cost more when they are used, they cost nothing when they are not. Consequently, customers save money by replacing fixed infrastructure with Clouds when workloads are spiky, specifically when the peak-to-average ratio is greater than the utility premium. Cloudonomics Law #2: On-demand trumps forecasting. Forecasting is often wrong, the ability to up and down scale to meet unpredictable demand spikes allows for revenue and cost optimalities. Cloudonomics Law #3: The peak of the sum is never greater than the sum of the peaks. Enterprises deploy capacity to handle their peak demands. Under this strategy, the total capacity deployed is the sum of these individual peaks. However, since Clouds can reallocate resources across many enterprises with different peak periods, a Cloud needs to deploy less capacity. Cloudonomics Law #4: Aggregate demand is smoother than

Cloudonomics Law #2: On-demand trumps forecasting. Forecasting is often wrong, the ability to up and down scale to meet unpredictable demand spikes allows for revenue and cost optimalities. Cloudonomics Law #3: The peak of the sum is never greater than the sum of the peaks. Enterprises deploy capacity to handle their peak demands. Under this strategy, the total capacity deployed is the sum of these individual peaks. However, since Clouds can reallocate resources across many enterprises with different peak periods, a Cloud needs to deploy less capacity. Cloudonomics Law #4: Aggregate demand is smoother than

Cloudonomics Law #3: The peak of the sum is never greater than the sum of the peaks. Enterprises deploy capacity to handle their peak demands. Under this strategy, the total capacity deployed is the sum of these individual peaks. However, since Clouds can reallocate resources across many enterprises with different peak periods, a Cloud needs to deploy less capacity. Cloudonomics Law #4: Aggregate demand is smoother than

Cloudonomics Law #4: Aggregate demand is smoother than individual. Aggregating demand from multiple customers tends to smooth out variation. Therefore, Clouds get higher utilization, enabling better economics.

Cloudonomics Law #5: Average unit costs are reduced by distributing fixed costs over more units of output. Larger Cloud providers can therefore achieve some economies of scale. Cloudonomics Law #6: Superiority in numbers is the most important factor in the result of a combat (Clausewitz). Service providers have the scale to fight rogue attacks.

Cloudonomics Law #6: Superiority in numbers is the most important factor in the result of a combat (Clausewitz). Service providers have the scale to fight rogue attacks.

Cloudonomics Law #7: Space-time is a continuum. Organizations derive competitive advantage from responding to changing business conditions faster than the competition. With Cloud scalability, for the same cost, a business can accelerate its information processing and decision-making. Cloudonomics Law #8: Dispersion is the inverse square of latency. Reduced latency is increasingly essential to modern applications. A Cloud Computing provider is able to provide more nodes, and hence reduced

Cloudonomics Law #8: Dispersion is the inverse square of latency. Reduced latency is increasingly essential to modern applications. A Cloud Computing provider is able to provide more nodes, and hence reduced latency, than an enterprise would want to deploy.

Cloudonomics Law #9: Don’t put all your eggs in one basket. The reliability of a system increases with the addition of redundant, geographically dispersed components such as data centers. Cloud Computing vendors have the scale and diversity to do so.

Cloudonomics Law #10: An object at rest tends to stay at rest. A data center is a very large object. Private data centers tend to remain in locations for reasons such as where the company was founded, or where they got a good deal on property. A Cloud service provider can locate greenfield sites optimally.