

{"id":477,"date":"2025-11-25T09:22:50","date_gmt":"2025-11-25T01:22:50","guid":{"rendered":"https:\/\/high-flyer.in.suopu.cc\/?p=477"},"modified":"2025-11-25T09:22:50","modified_gmt":"2025-11-25T01:22:50","slug":"hfai-python-%e4%bb%bb%e5%8a%a1%e6%8f%90%e4%ba%a4%e4%bb%bb%e6%84%8f%e6%89%80%e8%87%b3%ef%bc%8c%e8%90%a4%e7%81%ab%e8%ae%ad%e7%bb%83%e8%a1%8c%e4%ba%91%e6%b5%81%e6%b0%b4","status":"publish","type":"post","link":"https:\/\/high-flyer.in.suopu.cc\/en\/blog\/477\/","title":{"rendered":"hfai python | Task submission at will, Firefly training on the fly"},"content":{"rendered":"<p>Phantom AI has released its years-long sedimentary<a href=\"https:\/\/www.high-flyer.cn\/blog\/hfai_usage\">Deep Learning Suite hfai<\/a>\u00a0It has attracted many peer researchers and developers to inquire about the trial. The whole suite has more functions, and after familiarizing with this set of rules, it is possible to easily call up the platform's arithmetic resources, so as to efficiently complete the training task.<\/p>\n<p>To this end, we have created the \u201chfai use of the heart\u201d series of albums, episodes one after another to introduce you to the hfai some of the functions of the design ideas and principles, to help you better and faster to learn the heart, with hfai this set of \u201cmagic skills\u201d easily The hfai is a set of \"divine powers\" that can be used to easily deal with the challenges of deep learning assignments, so that the weight of the work can be lifted easily, and the examples can be used.<\/p>\n<p>The last two moves introduced you to the\u00a0<a href=\"https:\/\/www.high-flyer.cn\/blog\/hfai_workspace\">hfai workspace<\/a>\u00a0\u548c\u00a0<a href=\"https:\/\/www.high-flyer.cn\/blog\/hfai_venv\">haienv<\/a>It can help users to quickly<strong>Synchronize local (PC, personal cluster, etc.) project directory (code, data), environment<\/strong>To remote clusters. This set of combinations down, in fact, we can think of it as a \u201cbuild up\u201d and \u201cboost\u201d process, the next is the most central part of the \u201cdivine power\u201d, this article will introduce you to\u00a0<code class=\"language-text\">hfai python<\/code>It can help everyone<strong>Easily and quickly initiate and manage training tasks<\/strong>\u3002<\/p>\n<h2 id=\"\u4f7f\u7528\u573a\u666f\">Usage Scenarios<\/h2>\n<p>\u5982<a href=\"https:\/\/www.high-flyer.cn\/blog\/hfai_venv\">As described in previous articles<\/a>After users transfer local data, code, and environment to the remote firefly cluster, they can then submit tasks to train the model. However, how to Firefly cluster computing power evenly and reasonably allocated to different tasks, and how users can view the management of their own tasks?<\/p>\n<p>All of these functions can be accessed through the\u00a0<code class=\"language-text\">hfai python<\/code>\u00a0to realize.<\/p>\n<h2 id=\"\u57fa\u672c\u6982\u5ff5\">basic concept<\/h2>\n<p>Some basic concepts are introduced before they are used.<\/p>\n<p>Phantom AI presents the<strong>Task-level time-sharing scheduling<\/strong>the concept of managing firefly clusters.<strong>Arithmetic power is allocated in terms of tasks rather than users<\/strong>. Users are free to submit tasks and apply for arithmetic scale freely, and the resources are uniformly allocated by the cluster scheduling system. As shown in the figure below:<\/p>\n<p><span class=\"gatsby-resp-image-wrapper\"><a class=\"gatsby-resp-image-link\" href=\"https:\/\/hfai-static.high-flyer.cn\/static\/df50cdfcaa5b0f64df35a34b1d863593\/0a47e\/01.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"gatsby-resp-image-image\" title=\"01.png\" src=\"https:\/\/hfai-static.high-flyer.cn\/static\/df50cdfcaa5b0f64df35a34b1d863593\/0a47e\/01.png\" sizes=\"(max-width: 600px) 100vw, 600px\" srcset=\"https:\/\/hfai-static.high-flyer.cn\/static\/df50cdfcaa5b0f64df35a34b1d863593\/222b7\/01.png 163w,https:\/\/hfai-static.high-flyer.cn\/static\/df50cdfcaa5b0f64df35a34b1d863593\/ff46a\/01.png 325w,https:\/\/hfai-static.high-flyer.cn\/static\/df50cdfcaa5b0f64df35a34b1d863593\/0a47e\/01.png 600w\" alt=\"01.png\" \/><\/a><\/span><\/p>\n<p>Computing resources are sliced and diced in the time dimension, and computing power is allocated to different tasks. This approach not only improves the overall utilization rate of the cluster, but also facilitates users to flexibly scale the card for different tasks, so that everyone has the opportunity to call on large-scale computing power for AI research.<\/p>\n<p><span class=\"gatsby-resp-image-wrapper\"><a class=\"gatsby-resp-image-link\" href=\"https:\/\/hfai-static.high-flyer.cn\/static\/bdce3353392d8abd035d069bd65ea73a\/302a4\/04.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"gatsby-resp-image-image\" title=\"04.png\" src=\"https:\/\/hfai-static.high-flyer.cn\/static\/bdce3353392d8abd035d069bd65ea73a\/a6d36\/04.png\" sizes=\"(max-width: 650px) 100vw, 650px\" srcset=\"https:\/\/hfai-static.high-flyer.cn\/static\/bdce3353392d8abd035d069bd65ea73a\/222b7\/04.png 163w,https:\/\/hfai-static.high-flyer.cn\/static\/bdce3353392d8abd035d069bd65ea73a\/ff46a\/04.png 325w,https:\/\/hfai-static.high-flyer.cn\/static\/bdce3353392d8abd035d069bd65ea73a\/a6d36\/04.png 650w,https:\/\/hfai-static.high-flyer.cn\/static\/bdce3353392d8abd035d069bd65ea73a\/e548f\/04.png 975w,https:\/\/hfai-static.high-flyer.cn\/static\/bdce3353392d8abd035d069bd65ea73a\/302a4\/04.png 1080w\" alt=\"04.png\" \/><\/a><\/span><\/p>\n<p>On the Firefly cluster, training tasks are prioritized, with higher priority tasks able to preempt lower priority arithmetic resources. Tasks within the same priority level are organized as \u201c<strong>first come first served<\/strong>\u201dThe principle of providing arithmetic resources. hfai provides a number of convenient tools (for example:<a href=\"https:\/\/doc.hfai.high-flyer.cn\/api\/client.html\">hfai.client<\/a>\uff0c<a href=\"https:\/\/www.high-flyer.cn\/blog\/checkpoint\">hfai.checkpoint<\/a>) to facilitate your code during the training process<strong>Receive cluster scheduling information, set breakpoints for saving<\/strong>\u3002<\/p>\n<h2 id=\"\u672c\u5730\u4ee3\u7801\u8c03\u8bd5\">Local Code Debugging<\/h2>\n<p>In general, we normally perform deep learning training, such as training the\u00a0<a href=\"https:\/\/github.com\/HFAiLab\/hfai-models\/tree\/main\/bert\">Bert<\/a>\uff1a<\/p>\n<div class=\"gatsby-highlight\" data-language=\"bash\">\n<pre class=\"language-bash\"><code class=\"language-bash\">python bert.py -c large.yml<\/code><\/pre>\n<\/div>\n<p>When you need to train a model using phantom firefly, you can debug locally to fix the problem before submitting it to a cluster run. hfai provides signals to simulate interruptions to help you test the suitability of the cluster run method, as in the following example:<\/p>\n<div class=\"gatsby-highlight\" data-language=\"bash\">\n<pre class=\"language-bash\"><code class=\"language-bash\">hfai python bert.py -c large.yml ++ --suspend_seconds <span class=\"token number\">100<\/span> --life_state <span class=\"token number\">1<\/span><\/code><\/pre>\n<\/div>\n<p>here are<\/p>\n<ul>\n<li><code class=\"language-text\">++<\/code>\u00a0This marks the current task as a local debugging simulation.<\/li>\n<li><code class=\"language-text\">---suspend_seconds<\/code>\u00a0Indicates how many seconds to send an analog interrupt signal.<\/li>\n<li><code class=\"language-text\">--life_state<\/code>\u00a0Indicates the training position of the task record, this value can be set by the user in the code, for details refer to the<a href=\"https:\/\/doc.hfai.high-flyer.cn\/api\/client.html#hfai.client.set_whole_life_state\">here are<\/a>\u3002<\/li>\n<\/ul>\n<p>In this way, you can test whether your training code can accurately receive interrupt signals from the cluster, save breakpoints in the model, and continue to execute at the current breakpoint when the next task is pulled up.<\/p>\n<h2 id=\"\u63d0\u4ea4\u4efb\u52a1\">Submission of mandates<\/h2>\n<p>Once you have completed local debugging, you are ready to submit the task to Firefly with the following command:<\/p>\n<div class=\"gatsby-highlight\" data-language=\"bash\">\n<pre class=\"language-bash\"><code class=\"language-bash\">hfai python bert.py -c large.yml -- --nodes <span class=\"token number\">1<\/span> --name train_bert<\/code><\/pre>\n<\/div>\n<p>here are<\/p>\n<ul>\n<li><code class=\"language-text\">--<\/code>\u00a0This flags that the task is currently being submitted.<\/li>\n<li><code class=\"language-text\">--nodes<\/code>\u00a0indicates how many nodes are used for training, here<strong>The task assigns a minimum of 8 A100s to a node<\/strong>If your task doesn't require that many graphics cards, you can set it in the code<\/li>\n<li><code class=\"language-text\">--name<\/code>\u00a0Indicates the naming of the current task.<\/li>\n<\/ul>\n<p>It is important to note that<strong>External free users do not need to be prioritized\u00a0<code class=\"language-text\">--priority<\/code><\/strong>The cluster automatically allocates idle arithmetic to schedule tasks. Of course, if you need high-priority training tasks that are not interrupted, you can contact Mirage to discuss commercialization.<\/p>\n<h2 id=\"\u4efb\u52a1\u7ba1\u7406\">task management<\/h2>\n<p>After the task has been submitted successfully, you can\u00a0<a href=\"https:\/\/studio.yinghuo.high-flyer.cn\/\">studio<\/a>\u00a0You can see the status of your tasks in the following screenshot. As you can see in the image below, the interface shows the submitted task, the GPU and CPU utilization of the task, and the overall busyness of the cluster.<\/p>\n<p><span class=\"gatsby-resp-image-wrapper\"><a class=\"gatsby-resp-image-link\" href=\"https:\/\/hfai-static.high-flyer.cn\/static\/b4e805f8a020f713c7fd378a3e80b597\/d1882\/02.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"gatsby-resp-image-image\" title=\"02.png\" src=\"https:\/\/hfai-static.high-flyer.cn\/static\/b4e805f8a020f713c7fd378a3e80b597\/a6d36\/02.png\" sizes=\"(max-width: 650px) 100vw, 650px\" srcset=\"https:\/\/hfai-static.high-flyer.cn\/static\/b4e805f8a020f713c7fd378a3e80b597\/222b7\/02.png 163w,https:\/\/hfai-static.high-flyer.cn\/static\/b4e805f8a020f713c7fd378a3e80b597\/ff46a\/02.png 325w,https:\/\/hfai-static.high-flyer.cn\/static\/b4e805f8a020f713c7fd378a3e80b597\/a6d36\/02.png 650w,https:\/\/hfai-static.high-flyer.cn\/static\/b4e805f8a020f713c7fd378a3e80b597\/e548f\/02.png 975w,https:\/\/hfai-static.high-flyer.cn\/static\/b4e805f8a020f713c7fd378a3e80b597\/3c492\/02.png 1300w,https:\/\/hfai-static.high-flyer.cn\/static\/b4e805f8a020f713c7fd378a3e80b597\/d1882\/02.png 1562w\" alt=\"02.png\" \/><\/a><\/span><\/p>\n<p>You can also get the training status of the task at each time, as shown below, for better management and tuning of the deep learning model.<\/p>\n<p><span class=\"gatsby-resp-image-wrapper\"><a class=\"gatsby-resp-image-link\" href=\"https:\/\/hfai-static.high-flyer.cn\/static\/e156efa7f2b09acfc9f81533403a5d5d\/0d98f\/03.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"gatsby-resp-image-image\" title=\"03.png\" src=\"https:\/\/hfai-static.high-flyer.cn\/static\/e156efa7f2b09acfc9f81533403a5d5d\/a6d36\/03.png\" sizes=\"(max-width: 650px) 100vw, 650px\" srcset=\"https:\/\/hfai-static.high-flyer.cn\/static\/e156efa7f2b09acfc9f81533403a5d5d\/222b7\/03.png 163w,https:\/\/hfai-static.high-flyer.cn\/static\/e156efa7f2b09acfc9f81533403a5d5d\/ff46a\/03.png 325w,https:\/\/hfai-static.high-flyer.cn\/static\/e156efa7f2b09acfc9f81533403a5d5d\/a6d36\/03.png 650w,https:\/\/hfai-static.high-flyer.cn\/static\/e156efa7f2b09acfc9f81533403a5d5d\/e548f\/03.png 975w,https:\/\/hfai-static.high-flyer.cn\/static\/e156efa7f2b09acfc9f81533403a5d5d\/0d98f\/03.png 1276w\" alt=\"03.png\" \/><\/a><\/span><\/p>\n<p>In addition to starting and stopping tasks and viewing logs on the studio page, you can also use the hfai suite of tools on your local computer terminal. In addition, with the\u00a0<code class=\"language-text\">hfai python<\/code>\u00a0Similarly, Phantom AI provides\u00a0<code class=\"language-text\">hfai bash<\/code>\u00a0suite, which meets the needs of users who want to manage their training tasks in a granular way with bash scripts. More information can be found in the<a href=\"https:\/\/doc.hfai.high-flyer.cn\/cli\/task.html\">official document<\/a>\u3002<\/p>","protected":false},"excerpt":{"rendered":"<p>\u5e7b\u65b9 AI \u53d1\u5e03\u4e86\u5176\u6c89\u6dc0\u591a\u5e74\u7684\u6df1\u5ea6\u5b66\u4e60\u5957\u4ef6 hfai\u00a0\uff0c\u5438\u5f15\u4e86\u4f17\u591a\u540c\u884c\u7814\u7a76\u5458\u548c\u5f00\u53d1\u8005\u4eec\u54a8\u8be2\u8bd5\u7528\u3002\u6574\u4e2a\u5957\u4ef6\u7684\u529f\u80fd [&hellip;]<\/p>","protected":false},"author":1,"featured_media":478,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[7],"tags":[],"class_list":["post-477","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hfai-mantra"],"acf":[],"_links":{"self":[{"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/posts\/477","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/comments?post=477"}],"version-history":[{"count":1,"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/posts\/477\/revisions"}],"predecessor-version":[{"id":479,"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/posts\/477\/revisions\/479"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/media\/478"}],"wp:attachment":[{"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/media?parent=477"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/categories?post=477"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/high-flyer.in.suopu.cc\/en\/wp-json\/wp\/v2\/tags?post=477"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}