{"id":106,"date":"2025-08-03T17:11:53","date_gmt":"2025-08-03T17:11:53","guid":{"rendered":"https:\/\/ledatalab.fr\/?page_id=106"},"modified":"2025-08-17T18:12:34","modified_gmt":"2025-08-17T18:12:34","slug":"data-lake","status":"publish","type":"page","link":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/","title":{"rendered":"Data Lake"},"content":{"rendered":"<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-medium\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"300\" src=\"https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration-300x300.png\" alt=\"Illustration pour le terme Data Lake\" class=\"wp-image-216\" srcset=\"https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration-300x300.png 300w, https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration-1024x1024.png 1024w, https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration-150x150.png 150w, https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration-768x768.png 768w, https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration-1536x1536.png 1536w, https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration.png 2048w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/figure>\n<\/div>\n\n\n<h1 class=\"wp-block-heading has-text-align-center\">Data Lake<\/h1>\n\n\n\n<p class=\"has-text-align-center\"><em>Cat\u00e9gorie : Architecture &amp; Infrastructures<\/em><\/p>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">D\u00e9finition<\/h2>\n\n\n\n<p>Un Data Lake est un espace de stockage centralis\u00e9 qui permet de conserver de grandes quantit\u00e9s de donn\u00e9es brutes, dans leur format d\u2019origine, qu\u2019elles soient structur\u00e9es (comme des bases SQL), semi structur\u00e9es (comme des fichiers JSON, CSV) ou non structur\u00e9es (vid\u00e9os, images, logs, textes\u2026).<\/p>\n\n\n\n<p>Contrairement aux bases de donn\u00e9es traditionnelles, un Data Lake ne n\u00e9cessite aucune mod\u00e9lisation pr\u00e9alable. Les donn\u00e9es sont simplement d\u00e9vers\u00e9es dans le lac (d\u2019o\u00f9 le nom), et pourront \u00eatre transform\u00e9es, analys\u00e9es ou crois\u00e9es plus tard, selon les besoins.<\/p>\n\n\n\n<p>Il est souvent utilis\u00e9 dans des contextes :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>de big data<\/li>\n\n\n\n<li>d\u2019analyse avanc\u00e9e<\/li>\n\n\n\n<li>de data science<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Exemple d\u2019usage ou de contexte<\/h2>\n\n\n\n<p>Une entreprise du secteur retail collecte :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>des tickets de caisse (structur\u00e9s),<\/li>\n\n\n\n<li>des logs de navigation web (semi-structur\u00e9s),<\/li>\n\n\n\n<li>des vid\u00e9os de cam\u00e9ras de surveillance (non structur\u00e9es).<\/li>\n<\/ul>\n\n\n\n<p>Tous ces fichiers sont stock\u00e9s sans tri dans un Data Lake, puis utilis\u00e9s par les \u00e9quipes d\u2019analystes ou de data scientists pour :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>d\u00e9tecter des fraudes<\/li>\n\n\n\n<li>optimiser les rayons<\/li>\n\n\n\n<li>croiser comportements d\u2019achat et m\u00e9t\u00e9o<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Diff\u00e9rence avec un Data Warehouse<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th class=\"has-text-align-center\" data-align=\"center\">Data Lake<\/th><th class=\"has-text-align-center\" data-align=\"center\">Data Warehouse<\/th><\/tr><\/thead><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\">Donn\u00e9es brutes<\/td><td class=\"has-text-align-center\" data-align=\"center\">Donn\u00e9es nettoy\u00e9es &amp; mod\u00e9lis\u00e9es<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">Stocke tout type de donn\u00e9es<\/td><td class=\"has-text-align-center\" data-align=\"center\">Majoritairement structur\u00e9es<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">Faible co\u00fbt de stockage<\/td><td class=\"has-text-align-center\" data-align=\"center\">Plus co\u00fbteux mais plus performant<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">Utilis\u00e9 pour exploration libre<\/td><td class=\"has-text-align-center\" data-align=\"center\">Utilis\u00e9 pour reporting fiable<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Outils &amp; technologies associ\u00e9s<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cloud<\/strong> : Amazon S3 (AWS), Azure Data Lake Storage, Google Cloud Storage<br><\/li>\n\n\n\n<li><strong>\u00c9cosyst\u00e8mes open source<\/strong> : Hadoop, HDFS, Apache Spark<br><\/li>\n\n\n\n<li><strong>Traitements associ\u00e9s<\/strong> : Databricks, Presto, Snowflake (mode hybride)<br><\/li>\n\n\n\n<li><strong>Connecteurs<\/strong> : Power BI, Tableau, Python, SQL<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\udde0 \u00c0 retenir<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">Un Data Lake est un r\u00e9servoir centralis\u00e9 de donn\u00e9es brutes, con\u00e7u pour stocker tout type de donn\u00e9es \u00e0 grande \u00e9chelle, en vue d\u2019analyses futures ou de cas d\u2019usage avanc\u00e9s comme le machine learning.<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Data Lake Cat\u00e9gorie : Architecture &amp; Infrastructures D\u00e9finition Un Data Lake est un espace de stockage centralis\u00e9 qui permet de conserver de grandes quantit\u00e9s de donn\u00e9es brutes, dans leur format d\u2019origine, qu\u2019elles soient structur\u00e9es (comme des bases SQL), semi structur\u00e9es (comme des fichiers JSON, CSV) ou non structur\u00e9es (vid\u00e9os, images, logs, textes\u2026). Contrairement aux bases &#8230; <a title=\"Data Lake\" class=\"read-more\" href=\"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/\" aria-label=\"En savoir plus sur Data Lake\">Lire la suite<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":15,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-106","page","type-page","status-publish"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Lake - Le Data Lab<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Lake - Le Data Lab\" \/>\n<meta property=\"og:description\" content=\"Data Lake Cat\u00e9gorie : Architecture &amp; Infrastructures D\u00e9finition Un Data Lake est un espace de stockage centralis\u00e9 qui permet de conserver de grandes quantit\u00e9s de donn\u00e9es brutes, dans leur format d\u2019origine, qu\u2019elles soient structur\u00e9es (comme des bases SQL), semi structur\u00e9es (comme des fichiers JSON, CSV) ou non structur\u00e9es (vid\u00e9os, images, logs, textes\u2026). Contrairement aux bases ... Lire la suite\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/\" \/>\n<meta property=\"og:site_name\" content=\"Le Data Lab\" \/>\n<meta property=\"article:modified_time\" content=\"2025-08-17T18:12:34+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration-1024x1024.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ledatalab\" \/>\n<meta name=\"twitter:label1\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/index.php\\\/glossaire\\\/data-lake\\\/\",\"url\":\"https:\\\/\\\/ledatalab.fr\\\/index.php\\\/glossaire\\\/data-lake\\\/\",\"name\":\"Data Lake - Le Data Lab\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/index.php\\\/glossaire\\\/data-lake\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/index.php\\\/glossaire\\\/data-lake\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ledatalab.fr\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/Data-Lake_illustration-300x300.png\",\"datePublished\":\"2025-08-03T17:11:53+00:00\",\"dateModified\":\"2025-08-17T18:12:34+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/index.php\\\/glossaire\\\/data-lake\\\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ledatalab.fr\\\/index.php\\\/glossaire\\\/data-lake\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/index.php\\\/glossaire\\\/data-lake\\\/#primaryimage\",\"url\":\"https:\\\/\\\/ledatalab.fr\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/Data-Lake_illustration.png\",\"contentUrl\":\"https:\\\/\\\/ledatalab.fr\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/Data-Lake_illustration.png\",\"width\":2048,\"height\":2048,\"caption\":\"Illustration pour le terme Data Lake\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/index.php\\\/glossaire\\\/data-lake\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/ledatalab.fr\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Glossaire\",\"item\":\"https:\\\/\\\/ledatalab.fr\\\/index.php\\\/glossaire\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Data Lake\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/#website\",\"url\":\"https:\\\/\\\/ledatalab.fr\\\/\",\"name\":\"Le Data Lab\",\"description\":\"Le blog pour comprendre, explorer et utiliser la Data.\",\"publisher\":{\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/#organization\"},\"alternateName\":\"ledatalab\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ledatalab.fr\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/#organization\",\"name\":\"Le Data Lab\",\"alternateName\":\"ledatalab\",\"url\":\"https:\\\/\\\/ledatalab.fr\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/ledatalab.fr\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/couleur-vertical-a-droite-superpose.png\",\"contentUrl\":\"https:\\\/\\\/ledatalab.fr\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/couleur-vertical-a-droite-superpose.png\",\"width\":1000,\"height\":1000,\"caption\":\"Le Data Lab\"},\"image\":{\"@id\":\"https:\\\/\\\/ledatalab.fr\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/ledatalab\",\"https:\\\/\\\/www.instagram.com\\\/ledatalab\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/ledatalab\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Lake - Le Data Lab","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/","og_locale":"fr_FR","og_type":"article","og_title":"Data Lake - Le Data Lab","og_description":"Data Lake Cat\u00e9gorie : Architecture &amp; Infrastructures D\u00e9finition Un Data Lake est un espace de stockage centralis\u00e9 qui permet de conserver de grandes quantit\u00e9s de donn\u00e9es brutes, dans leur format d\u2019origine, qu\u2019elles soient structur\u00e9es (comme des bases SQL), semi structur\u00e9es (comme des fichiers JSON, CSV) ou non structur\u00e9es (vid\u00e9os, images, logs, textes\u2026). Contrairement aux bases ... Lire la suite","og_url":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/","og_site_name":"Le Data Lab","article_modified_time":"2025-08-17T18:12:34+00:00","og_image":[{"width":1024,"height":1024,"url":"https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration-1024x1024.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@ledatalab","twitter_misc":{"Dur\u00e9e de lecture estim\u00e9e":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/","url":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/","name":"Data Lake - Le Data Lab","isPartOf":{"@id":"https:\/\/ledatalab.fr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/#primaryimage"},"image":{"@id":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/#primaryimage"},"thumbnailUrl":"https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration-300x300.png","datePublished":"2025-08-03T17:11:53+00:00","dateModified":"2025-08-17T18:12:34+00:00","breadcrumb":{"@id":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/#primaryimage","url":"https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration.png","contentUrl":"https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/Data-Lake_illustration.png","width":2048,"height":2048,"caption":"Illustration pour le terme Data Lake"},{"@type":"BreadcrumbList","@id":"https:\/\/ledatalab.fr\/index.php\/glossaire\/data-lake\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/ledatalab.fr\/"},{"@type":"ListItem","position":2,"name":"Glossaire","item":"https:\/\/ledatalab.fr\/index.php\/glossaire\/"},{"@type":"ListItem","position":3,"name":"Data Lake"}]},{"@type":"WebSite","@id":"https:\/\/ledatalab.fr\/#website","url":"https:\/\/ledatalab.fr\/","name":"Le Data Lab","description":"Le blog pour comprendre, explorer et utiliser la Data.","publisher":{"@id":"https:\/\/ledatalab.fr\/#organization"},"alternateName":"ledatalab","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ledatalab.fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/ledatalab.fr\/#organization","name":"Le Data Lab","alternateName":"ledatalab","url":"https:\/\/ledatalab.fr\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/ledatalab.fr\/#\/schema\/logo\/image\/","url":"https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/couleur-vertical-a-droite-superpose.png","contentUrl":"https:\/\/ledatalab.fr\/wp-content\/uploads\/2025\/08\/couleur-vertical-a-droite-superpose.png","width":1000,"height":1000,"caption":"Le Data Lab"},"image":{"@id":"https:\/\/ledatalab.fr\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/ledatalab","https:\/\/www.instagram.com\/ledatalab\/","https:\/\/www.linkedin.com\/company\/ledatalab"]}]}},"_links":{"self":[{"href":"https:\/\/ledatalab.fr\/index.php\/wp-json\/wp\/v2\/pages\/106","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ledatalab.fr\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ledatalab.fr\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ledatalab.fr\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ledatalab.fr\/index.php\/wp-json\/wp\/v2\/comments?post=106"}],"version-history":[{"count":4,"href":"https:\/\/ledatalab.fr\/index.php\/wp-json\/wp\/v2\/pages\/106\/revisions"}],"predecessor-version":[{"id":305,"href":"https:\/\/ledatalab.fr\/index.php\/wp-json\/wp\/v2\/pages\/106\/revisions\/305"}],"up":[{"embeddable":true,"href":"https:\/\/ledatalab.fr\/index.php\/wp-json\/wp\/v2\/pages\/15"}],"wp:attachment":[{"href":"https:\/\/ledatalab.fr\/index.php\/wp-json\/wp\/v2\/media?parent=106"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}