Nvidia GH200 624GB high-end server for inferencing, fine-tuning AI GPT LLM

Description: Perfect for Mistral Large 2 at 8bit and Llama-3.1 70B at 16bit Specs: Nvidia Grace-Hopper Superchip 72-core Nvidia Grace CPU Nvidia Hopper H200 Tensor Core GPU 480GB of LPDDR5X memory with EEC 144GB of HBM3e memory 624GB of total fast-access memory NVlink-C2C: 900 GB/s of bandwidth Programmable from 450W to 1000W TDP (CPU + GPU + memory) 2x High-efficiency 2000W PSU 2x PCIe gen4 M.2 slots on board 2x PCIe gen5 2.5" drive slots (NVMe) without Bluefield-3 4x PCIe gen5 2.5" drive slots (NVMe) with Bluefield-3 3x FHFL PCIe Gen5 x16 1x USB 3.2 port 2x RJ45 10GbE ports 1x RJ45 IPMI port 1x Mini display port Stainless steel cage nuts Air-cooled 6x60mm fans Rail kit 2U 440 x 88 x 900 mm (17.3 x 3.5 x 35.4") 34 kg (75 lbs) 3-year manufacturer's warranty Free express shipping worldwide. Options: Liquid cooling including air-liquid in-rack CDU 4U 10kW$4000 Liquid cooling including liquid-liquid in-rack CDU 2U 20kW$4000 Air cooling mod - quiet Noctua fans 25 db instead of loud factory fans (please note that cooling and system performance at high load is slightly reduced)$200 Network card Nvidia Bluefield-3 2-port 400Gb QSFP112 (adds two more 2.5" hard disks, then a total of 4)$4000 Network card Nvidia ConnectX-7 2-port 200Gb QSFP58/112$1500 Network card Intel E810-CQDA2 2-port 100Gb QSFP28$500 Hard disk Corsair Force Series MP600 Pro 8TB, M.2 SSD$900 Hard disk Corsair Force Series MP600 Pro 8TB, M.2 SSD$900 Hard disk Solidigm D5-P5336 61TB 2.5" U.2 SSD$6500 Hard disk Solidigm D5-P5336 61TB 2.5" U.2 SSD$6500 Hard disk Solidigm D5-P5336 61TB 2.5" U.2 SSD$6500 Hard disk Solidigm D5-P5336 61TB 2.5" U.2 SSD$6500 Storage controller HighPoint Rocket 1628A PCIe Gen5$1500 Raid controller Graid SupremeRAID SR-1010$3300 OS Ubuntu server 24.04 preinstalled$0 Request for additional or different components and special wishes $0GPTrack.ai presents: The ultimate "small-sized" ready-to-use rack server systems for AI and HPC. Run, tune and train the biggest GPT large language models locally. Currently the only "small-sized" NV-linked superchip rack systems in the world. Get our mind-blowing Nvidia GH200 Grace-Hopper superchip systems. Order our mind-blowing Nvidia GB200 Grace-Blackwell superchip systems now! Get our mind-blowing AMD Mi300 systems. We are very proud to announce the world's first and only Nvidia GH200 Grace-Hopper Superchip and Nvidia GB200 Grace-Blackwell Superchip-powered NV-linked, liquid-cooled, CDU-integrated, ready-to-use, "small-sized" rack server systems. Multiple NV-linked GH200 or GB200 act as a single giant GPU (CPU-GPU superchip) with one single giant coherent memory pool. We are the only ones who offer smaller systems than a complete rack with "only" 1, 2, 4, 8, 16 or 18 superchips (NVL2, NVL4, NVL8, NVL16 and NVL36). If you like AMD, we offer Mi300 systems too. All systems have a coolant distribution unit (CDU) integrated into the rack, are ready-to-use and are perfect for inferencing insanely huge LLMs, quick fine-tuning and training of LLMs. Example use case 1: Inferencing Llama-3.1 405B and Mistral Large 2Llama-3.1 405B: https://ai.meta.com/blog/meta-llama-3-1/Mistral Large 2: https://mistral.ai/news/mistral-large-2407/Llama-3.1 405B and Mistral Large 2 are the most powerful open-source models by far and even beat GPT-4omni.Llama-3.1 405B with 8-bit quantization needs at least 405GB of memory to swiftly run inference! Mistral Large 2 with 8-bit quantization needs at least 123GB of memory to swiftly run inference! Luckily, GH200 has up to 288GB, GB200 Blackwell has up to 14TB. Mi300A has 512GB and Mi300X has 1.5TB. With GH200 Mistral Large 2 can be run in VRAM only for ultra high inference speed (approx. 50 tokens/sec). With Mi300 and GB200 Blackwell this is also possible for Llama-3.1 405B. With GB200 Blackwell you can expect up to 1000 tokens/sec. If the model is bigger than VRAM you can only expect approx. 1-10 tokens/sec. 4-bit quantization seems to be the best trade-off between speed and accuracy, but is natively only supported by GB200 Blackwell. Example use case 2: Fine-tuning Llama-3 405B with PyTorch FSDP and Q-LoraTutorial: https://www.philschmid.de/fsdp-qlora-llama3Models need to be fine-tuned on your data to unlock the full potential of the model. But efficiently fine-tuning bigger models like Llama-3 405B remained a challenge until now. This blog post walks you through how to fine-tune Llama 3 using PyTorch FSDP and Q-Lora with the help of Hugging Face TRL, Transformers, peft & datasets.Fine-tuning big models within a reasonable time requires special and beefy hardware! Luckily, GH200, GB200 or Mi300 are ideal to complete this task extremely quickly. Example use case 3: Generate videos with Open-SoraDownload: https://github.com/hpcaitech/Open-Sora/Open-Sora is democratizing efficient video production for all.Generating videos with Open-Sora requires special and beefy hardware! Luckily, GH200, GB200 or Mi300 are ideal for this task. Example use case 4: Creating a Large Language Model from scratchTutorial: https://www.pluralsight.com/resources/blog/data/how-build-large-language-modelImagine stepping into the world of language models as a painter stepping in front of a blank canvas. The canvas here is the vast potential of Natural Language Processing (NLP), and your paintbrush is the understanding of Large Language Models (LLMs). This article aims to guide you, new to NLP, in creating your first Large Language Model from scratch, focusing on the Transformer architecture and utilizing TensorFlow and Keras.Taining a LLM from scratch within a reasonable time requires special and extremely beefy hardware! Luckily, GH200, GB200 or Mi300 are ideal for this task. Why should you buy your own hardware?"You'll own nothing and you'll be happy?" No!!! Never should you bow to Satan and rent stuff that you can own. In other areas, renting stuff that you can own is very uncool and uncommon. Or would you prefer to rent "your" car instead of owning it? Most people prefer to own their car, because it's much cheaper, it's an asset that has value and it makes the owner proud and happy. The same is true for compute infrastructure.Even more so, because data and compute infrastructure are of great value and importance and are preferably kept on premises, not only for privacy reasons but also to keep control and mitigate risks. If somebody else has your data and your compute infrastructure you are in big trouble.Speed, latency and ease-of-use are also much better when you have direct physical access to your stuff.With respect to AI and specifically LLMs there is another very important aspect. The first thing big tech taught their closed-source LLMs was to be "politically correct" (lie) and implement guardrails, "safety" and censorship to such an extent that the usefulness of these LLMs is severely limited. Luckily, the (open-source) tools are out there to build and tune AI that is really intelligent and really useful. But first, you need your own hardware to run it on. What are the main benefits of GH200 Grace-Hopper and GB200 Grace-Blackwell?They have enough memory to run, tune and train the biggest LLMs currently available.Their performance in every regard is almost unreal (up to 8520 times faster than x86).There are no alternative systems with the same amount of memory.Ideal for AI, especially inferencing, fine-tuning and training of LLMs.Multiple NV-linked GH200 or GB200 act as a single giant GPU.Optimized for memory-intensive AI and HPC performance.Ideal for HPC applications like, e.g. vector databases.Easily customizable, upgradable and repairable.Privacy and independence from cloud providers.Cheaper and much faster than cloud providers.They can be very quiet (with liquid-liquid CDU).Reliable and energy-efficient liquid cooling.Flexibility and the possibility of offline use.Gigantic amounts of coherent memory.They are very power-efficient.The lowest possible latency.They are beautiful.CUDA enabled.Run Linux. GB200 Blackwell The coming Nvidia GB200 Grace-Blackwell superchip has truly amazing specs to show off. GPTrack.ai ready-to-use rack server systems with multiple NV-linked Nvidia GB200 Grace-Blackwell (up to 72) will arrive Q4 2024. Be one of the first in the world to get a GB200 rack system. Order now! What is the difference to alternative systems? The main difference between GH200/GB200 and alternative systems is that with GH200/GB200, the GPU is connected to the CPU via a 900 GB/s NVLink vs. 128 GB/s PCIe gen5 used by traditional systems. Furthermore, multiple superchips can be connected via 900/1800 GB/s NVLink vs. orders of magnitude slower network connections used by traditional systems. Since these are the main bottlenecks, GH200/GB200's high-speed connections directly translate to much higher performance compared to traditional architectures. Also, multiple NV-linked GH200 or GB200 act as a single giant GPU (CPU-GPU superchip) with one single giant coherent memory pool. What is the difference to server systems of competitors?Size: We focus on systems that are not bigger than one single rack. With GB200 that gives you more than an exaflop of compute. If that is really not enough for you, we are happy to make you a custom offer. But for many people, one complete rack is more than needed and too expensive. That is why we also offer smaller systems with only 1, 2, 4, 8, 16 or 18 superchips (NVL2, NVL4, NVL8, NVL16 and NVL36). We are, to our knowledge, the only ones in the world where you can get systems smaller than a complete GH200 NVL32 or GB200 NVL72 rack.In-rack CDU: Our rack server systems come standard with liquid cooling and a CDU integrated directly into the rack. You can choose between an air-liquid and liquid-liquid CDU.Ready-to-use: In contrast to other vendors, our systems come fully integrated and ready-to-use. Everything that is needed is included and tested. All you need to do is plug your system in to run it. Technical details of our GH200/GB200 rackserver systems (base configuration)Standard 19-inch or 21-inch OCP rackLiquid-cooledIn-rack CDU (air-liquid or liquid-liquid)Multiple Nvidia GH200 Grace-Hopper SuperchipsMultiple Nvidia GB200 Grace-Blackwell SuperchipsMultiple 72-core Nvidia Grace CPUsMultiple Nvidia Hopper H100 Tensor Core GPUs (on request)Multiple Nvidia Hopper H200 Tensor Core GPUs (on request)Multiple Nvidia Blackwell B100 Tensor Core GPUsUp to 72x 480GB of LPDDR5X memory with error-correction code (ECC)Up to 13.5TB of HBM3e memoryUp to 30TB of total fast-access memoryNVLink-C2C: 900 GB/s of bandwidthGH200: Programmable from 450W to 1000W TDP (CPU + GPU + memory)GB200: Programmable from 1200W to 2700W TDP (CPU + 2 GPU + memory)Up to 6x power shelveUp to 72x PCIe gen5 M.2 slots on boardUp to 288x PCIe gen5 drives (NVMe)Up to 108x FHFL PCIe Gen5 x163 years manufacturer's warrantyUp to 48U 600 x 2616 x 1200 mm (23.6 x 103 x 47.2")Up to 1500 kg (3300 lbs) Optional componentsNIC Nvidia Bluefield-3 400GbNIC Nvidia ConnectX-7 200GbNIC Intel 100GbUp to 72x 4TB M.2 SSDUp to 288x 8TB E1.S SSDUp to 288x 60TB 2.5" SSDStorage controllerRaid controllerOS preinstalledAnything possible on request Need something different? We are happy to build custom systems to your liking. Compute performance of one GH20067 teraFLOPS FP641 petaFLOPS TF322 petaFLOPS FP164 petaFLOPS FP8 Compute performance of one GB20090 teraFLOPS FP645 petaFLOPS TF3210 petaFLOPS FP1620 petaFLOPS FP840 petaFLOPS FP4 Benchmarks Phoronix has so far benchmarked the Grace CPU. More is coming soon: https://www.phoronix.com/review/nvidia-gh200-gptshop-benchmarkhttps://www.phoronix.com/review/nvidia-gh200-amd-threadripperhttps://www.phoronix.com/review/aarch64-64k-kernel-perfhttps://www.phoronix.com/review/nvidia-gh200-compilers White paper: Nvidia GH200 Grace-Hopper white paper

Price: 42000 USD

Location: Ebern

End Time: 2024-11-08T10:44:53.000Z

Shipping Cost: 0 USD