SDDC deployment – Appendix: Preflight before Onboarding

When preparing for the deployment of your first SDDC, you need to collect the configuration data in advance. The settings ideally should be captured at the design stage, as discussed in the previous chapter.

The following table depicts the configuration items you need to provide to successfully deploy your first SDDC:

Configuration sectionConfiguration itemDescription
SDDC (see Figure 12.3 for details)NameFree text field. You can change the name after the deployment as well. It is recommended to use the company naming convention.
 AWS RegionAWS Region where your SDDC resides. The Region should fit your subscription, AWS VPC configuration, and AWS DX configuration (if in use).
 DeploymentSingle host – for POC only, for 60 days only. Multi-host – production deployment. Stretched cluster – a deployment across two AWS AZs.
 Host typeSelect one of the available host types. The host type should fit into your subscription, design, and workload requirements. You have a choice between: i3.metali3en.metalI4i.metal See Figure 12.4 for the deployment wizard where the host type is specified. VMware constantly adds new instances. Check the VMware documentation for the available instances.
 Number of hostsCount of ESXi hosts in your first cluster. If your design requires a multi-cluster setup, you will add additional clusters after the SDDC is provisioned with the first cluster.
AWS Connection (see Figure 12.2 for details)AWS accountThis is an AWS account you own. Choose the account according to the design and security requirements.
 Choose a VPCSelect an AWS VPC (the VPC should be precreated) in your AWS account. This VPC will become a connected VPC after the deployment.
 Choose subnet(s)Select a subnet in your VPC (the subnet must be precreated). The subnet must have enough free IPs for the SDDC deployment (to accommodate ESXi hosts’ ENI interfaces). The subnet also defines the destination AZ. You cannot change the subnet after the deployment. If you deploy a stretched cluster SDDC, you must select two subnets in two different AZs.
SDDC networkingProvide the management subnet CIDRYou should provide a private network subnet with enough IP addresses for the SDDC management (vCenter, ESXi hosts, vSAN network, etc.). It is recommended to use a /23 subnet if you plan to deploy more than 10 hosts. You cannot change the subnet after the deployment. Make sure the subnet does not overlap with the on-premises or other connected networks (including AWS).

Table 12.1 – SDDC Configuration Details

You can review the deployment wizard in Figure 12.3:

Figure 12.3 – SDDC deployment wizard SDDC Properties

You can review the VPC and subnet details of the SDDC wizard in Figure 12.4:

Figure 12.4 – SDDC deployment wizard. AWS VPC and subnet

After you have provisioned the SDDC, you must configure access to the vSphere Web Client to manage your SDDC through VMware vCenter Server. You will use the NSX manager UI to create a Management Gateway Firewall Rule. By default, access to vCenter is not allowed. You will specify an IP or a subnet and entitle it to access vCenter. An “allow all” rule is not possible.

Purchasing and onboarding – Appendix: Preflight before Onboarding

In this chapter, we will cover the most important configuration items you need when you deploy the SDDC and configure a hybrid cloud environment.

You will find a detailed description of the configuration steps and items from previous chapters of this book.

Purchasing and onboarding

When purchasing the service and preparing for the first SDDC deployment, you need to choose a couple of options. These options may have a large impact on the further operations of the service, so make sure your choices are well thought out, as you will not be able to change some of them moving forward.

Purchasing and funding

When purchasing the service, you can select one of the following options:

  • A direct VMware purchase
  • AWS resell
  • Purchasing through a Managed Service Provider (MSP)

VMware Cloud on AWS supports all three routes to the market. Depending on your purchase strategy, you may find one or other better suited to your needs.

Note

Some services available for VMware Cloud on AWS can only be purchased directly from VMware, for example, Microsoft host-based licenses for workloads on VMware Cloud on AWS.

When purchasing from VMware, you can choose how you want to pay for the service:

  • VMware Purchasing Programs: You can select from a different range of programs, most of them offering so-called Credits. You can use credits toward payment for VMware Cloud on AWS. Consult a VMware sales representative to get more details about available programs. (More details on VMware Purchasing Programs can be found here: https://customerconnect.vmware.com/web/vmware/spp-landing.)
  • Pay by invoice: You can activate pay by invoice using the VMware Cloud Console.
  • Pay with a credit card: Applicable for small purchases up to $25,000.

Consumption options

When deploying VMware Cloud on AWS SDDC, you have a choice between the following:

  • Subscription: Your commitment to buy a certain amount of host capacity for a defined period. When purchasing a subscription, you select the AWS Region, host type, and the number of hosts. You can pay upfront or monthly. If purchasing from VMware or AWS, you can select the following:
    • Flexible subscription: The terms of the subscription (number of hosts, region, host types) can be changed over time (limitations apply)
    • Standard subscription: The terms of the subscription are fixed and cannot be changed
  • On-demand: You can run VMware Cloud on AWS SDDC using on-demand prices. You are free to select the region, host type, and the number of hosts.

Typically, a standard 3-year term subscription is the most cost-effective option, while on-demand prices are the highest. Depending on your use case, one or another option might work better. In our experience, a flexible subscription is the right balance between flexibility and cost savings.

Networking – Knowing the Best Practices, FAQs, and Common Pitfalls

The network communication with workloads deployed on your SDDC is a key part of the overall user experience and, probably, one of the most complex design sections. Network configuration is under the organization’s control; VMware only provides underlying network connectivity with the hardware AWS infrastructure.

Let’s highlight the most common network misconfigurations:

  • Insufficient connection between on-premises and the VMware Cloud on AWS SDDC

It’s a common practice to initially configure an IPSec VPN over the internet to achieve basic connectivity between on-premises and the SDDC and to secure the traffic flow. However, a VPN tunnel over the internet is not suitable for a mass migration of the workload. Live vMotion over the internet is not supported. Unpredictable bandwidth and latency affect the migration timeline making it unpredictable. For a large-scale migration and/or a hybrid cloud use case, you need to plan for a dedicated private connection to your SDDC.

  • Underestimating Level 2 network extension complexity

HCX and/or NSX Standalone Edge provide a unique feature – the ability to stretch a Layer 2 broadcast domain for a selected VLAN and allow the workload to retain the original IP addresses. This feature enormously helps to seamlessly migrate applications without an impact on the client configuration. On the other hand, this feature has several trade-offs, impacting workload availability and/or performance:

  • For workloads deployed on a Layer 2 extended segment (even with the MON feature enabled), all traffic sent to destinations residing outside of the SDDC network will first reach the default gateway, located on-premises. It may cause unexpected high latency when accessing workloads residing in native AWS VPC, including the connected VPC.
    • Workloads have a clear dependency on the on-premises default gateway. If the link between on-premises and the SDDC stops functioning, the workload on the extended leg of the segment would not be able to reach the default gateway and communicate with the external destination.
    • Undersized HCX Layer 2 extension appliances: All broadcast traffic within the VLAN must traverse the extension appliances on both sides of the tunnel. If the appliance is overloaded and/or does not have enough resources, the workload residing in the SDDC drops all external connections. This scenario is often observed with entry-level clusters based on the i3.metal host type. You can scale out and deploy multiple extension appliance pairs and distribute extended segments between appliances.
    • Extension appliance availability: As mentioned earlier, the Layer 2 extension has a direct dependency on the HCX appliance. If the appliance stops working, becomes corrupted, or restarts, the network communication is affected. If you plan to maintain the extension after the migration is complete, use the HA feature of HCX extension appliances. Bear in mind that for a complex environment with a lot of extended VLANs, configuring HA will reduce compute and storage resources on both sides of the environment, including the SDDC. You may need to scale out the vSphere cluster hosting appliance on the SDDC side, incurring additional costs.
    • Security concerns: Many security teams tend not to allow a Layer 2 extension over the public internet as it poses security risks and exposes sensitive broadcast traffic to the internet. When not properly addressed in the design phase, it might drastically affect your migration plans if you were planning to live migrate and retain the IP addresses. The best solution is to use a dedicated DX line and pass the extension traffic over the DX, which must address most of the concerns of the security team.
  • Identify network dependencies after migration

Many organizations claim that performance suffers after migrating workloads to the cloud. Some of these concerns are due to not following the best practices while migrating; however, in many cases, it has nothing to do with the SDDC. For a complex distributed application when not all components were properly identified and migrated to the cloud, the traffic may have additional hops traversing the WAN link(s), adding not foreseen latency to the application. An example of this is a migration of a SQL Server database warehouse, where the centralized integration service (SSIS) was left on premises, causing all the data to be first moved back to on-premises and then retransmitted to the SDDC. The impact of this configuration on the application was measured at a 300% increase in the OLAP cube generation time. The troubleshooting and search for affected traffic flows may be a complex and time-consuming task. VMware Aria Operations for Networks can help you visualize the traffic flow for a selected application.

Avoiding common pitfalls – Knowing the Best Practices, FAQs, and Common Pitfalls

In the previous section, we were focused on how to do things right. However, it’s also important to highlight the most common scenarios, configurations, and design decisions where a resulting configuration proved to be ineffective and error-prone.

Compute

Compute resources provide the necessary CPU and memory resources for virtual machines. Let’s review the most common misconfigurations and/or suboptimal design choices:

  • Sizing

It’s often the case that VMware Cloud on AWS SDDCs are either undersized or oversized. Undersized environments lead to low performance and a bad user experience, while oversized environments are expensive in terms of cost per VM. Opting for a right-sizing exercise and expanding on-premises vSphere environments as an afterthought may result in running into extended procurement cycles. However VMware Cloud on AWS benefits from the flexible and elastic capacity of public clouds. Paired with the right Elastic DRS policy, organizations can achieve cost savings by leveraging the scale-in option of the Elastic DRS policy, and performance burst, if required, by scaling out their cluster when demand grows. We recommend using custom Elastic DRS policies, which give you much better control not only over the storage resources, but also CPU and memory.

  • Host type

Another common misconfiguration we observe a lot is selecting the wrong host type. We observe most issues with configurations involving the i3.metal host type. i3.metal might be suitable for running general-purpose workloads, but its outdated CPU (Broadwell) and lack of hyperthreading (and as a result, its low amount of CPU resources) makes resource contention very possible, especially with entry-level clusters (https://vmc.techzone.vmware.com/resource/entry-level-clusters-vmware-cloud-aws). A two-host i3.metal cluster is limited to 35 simultaneously running VMs, as most of the CPU resources are allocated to management VMs. Such a cluster might be suitable as a management cluster but should not be considered for production implementation. i3.metal End of Sale (EoS) naturally eliminates this problem; however, you still might be tempted to take i3.metal using an on-demand subscription for your ongoing project to profit from the cost. We strongly recommend not doing so at this point and consider i4i.metal, which has a much more powerful and modern CPU.

  • SDDC upgrade and lifecycle management

Most of the observed issues are tied with the wrong expectations: VMware releases a new SDDC software bundle every 6 months. This bundle is based on the latest vSphere + NSX version at the release time. With all the excitement, there are a couple of issues to underline:

  • Do not expect your SDDC to be upgraded overnight. For a brownfield (existing) SDDC, the estimated upgrade time is 6+ months. Depending on the complexity of your SDDC, it may be more.
    • Version inconsistency: VMware Cloud on AWS SDDCs always use the latest available build for deployment. You cannot specify a build version when deploying your SDDC. Current bundles use vSphere 8, while your on-premises environment might be still on vSphere 7. It may have a negative effect on reverse migration, potential incompatibility with management/automation/monitoring tools, and prevent you from raising the virtual hardware level of the VMs you migrate to the cloud.
  • Configuration management

VMware Cloud on AWS is offered as a managed service. Most ESXi/vSphere cluster/vCenter configurations are predefined and cannot be changed. If your applications or automation tools depend on a particular advanced setting, make sure to clarify the configuration before deployment. You would not be able to change the value after deployment.