NSX-T UI unavailable error code:101

Published by Valentin on

I just figured out an error that was not documented in the public VMware KB

Symptoms:

On the UI of all NSX managers this error is shown:

Some appliances or component are not functioning properly:
Component health: Manager: unknown, Search: Unknown; node_mgmt:UP, UI:UP
error code :101

On the NSX command line you can find by typing df -h that the partition /var/dump is full

In the log file /var/log/corfu/corfu-compactor-audit.log, you will find some logs that report this error « java.lang.OutOfMemoryError: Java heap space »

The version of NSX was 3.2.2

The cause:

In my case, it was linked to the Cluster Boot Manager who failed multiple time

The error view in the corfu-compactor-audit.log means that there is not enough memory space for running the compactor process (shrink database).

The Resolution:

You must contact VMware support to solve this issue; they have an internal KB for solving this issue but it takes a lot of time to apply.

  1. They will take a cold clone of each NSX Manager
    • Graceful shutdown
    • Clone NSX Manager
    • Start
    • Waiting the CLI get cluster status gives a clear status (it takes more time than usually)
    • continue with the next one
  2. Shutdown NSX Services on each NSX manager
  3. Shutdown CORFU Database on each NSX Manager
  4. Increase the heap space allocated to the compactor to 8GB
  5. On the 1st node
    • Run the compactor, it took more than 20 min
    • Run the compactor without the locking option, it took less than 5 min
  6. On the 2nd node
    • Run the compactor, it took more than 25 min
    • Run the compactor without the locking option
      • On 1st and 2nd nodes, it took less than 5 min for both of them
  7. On the 3rd node
    • Run the compactor, it took more than 25 min
    • Run the compactor without the locking option
      • On the 1st, 2nd, and 3rd nodes, it took less than 5 min for all of them
  8. Decrease the heap space allocated to the compactor to 2GB
  9. Start Corfu DB on each node
    • monitor the resync
  10. Start NSX service
    • Monitor with get cluster status
  11. Delete all dump that are stored on /var/dump

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *