SharePoint – Export SharePoint Documents And Metadata Using Power Shell Script

by Manas Ranjan Moharana · Published July 29, 2018 · Updated July 30, 2018

Recently one of our client wanted to move out of SharePoint and the customer have a Sharepoint farm with 800 web sites. We have asked to extract all content from a SharePoint site with their metadata for archival. They need all documents to be download to a folder structure as it is in the folder hierarchy in the site.They would also like to have a CSV file for each document library, which will contain all metadata of each document including some other important data of the document .e.g. Site Url, Document Path after downloaded, Mime Type and Checksum of each document. Customer will use this information to migrate the contents to destination which is implemented using different technology. So we have used PowerShell to address the customer’s requirement.

After googling some time I have come across a nice blog written by Nik Craik .We have used the script from this blog as our base script, then modified and added new features as per our requirement. Along with the required features, we have included logging feature. The log file provides details like, number of documents in a site and number of document exported from that site. It also log the details of the error if there is any during the export.

In our case we have used a csv file, which will provide all required inputs to the script. The file contains list of site urls, need to be export and a holder name for each site so that a folder will be create as per holder name and the document will be download to their respected holder name folder.

In this script we have, restrict the script so that it will only download documents from the document library which template id is 101 and avoid all other document library. You can modify the script as per your requirement.

I put detailed comments in the script, to make it self explanatory.

<#
.SYNOPSIS
	Export the metadata of all document to a CSV file. We can provide multiple site url so 
	that at a time we can export documents from multiple sites.

.PARAMETER InputFilepath
	The File path is a path where the csv file exist. this file contains all the site url and holder name.
	This file will provide the inputs to the script
     e.g "C:\M\Script\Site_Details.csv"
	 
 .PARAMETER Destination
	The location path,where the script will Download files and its metadata .
	e.g "C:\M\Export"
#>

   param
    (
    	[parameter(Mandatory=$true)][string]$InputFilepath,
        [parameter(Mandatory=$true)][string]$Destination
    )

	 #Check and add PowerShell snap in
    if(-not(Get-PSSnapin | Where { $_.Name -eq "Microsoft.SharePoint.PowerShell"}))
    {
		Add-PSSnapin Microsoft.SharePoint.PowerShell -erroraction SilentlyContinue;
    }
	
	
#The File path is a path where the log file will be create.
#e.g  "C:\M\Export\logs.txt"
$logPath = ''  

#Create a new folder under the site destination folder for document where the actual folder structure is very deep.
# So that some time,the long url issue can be avoied.
$longFolderDocumentPath="\Long Path Docs\"

$FinalDestination=''

# variable to hold total number of document in the provided site
[int]$SiteDocumentCount= 0

[int]$DownloadedDocCount=0 

#Export site by taking both input
ExportSite $InputFilepath $Destination

# The initial method to start exporting both site document and its metadata
#function ExportSite($site,$destination)
#$filepath : the path of csv file
#$Defaultdestination : location on file system , to save the exported files
function ExportSite($filepath,$Defaultdestination)
{ 
	$logPath =$Defaultdestination+'\logs.txt'
     try 
    {  
	   #In this csv we have used ';' as delimeter because , few site url contains ',' so it breaks the coad
	    $createList = Import-Csv -path $filepath -Delimiter ';'
    }
    catch
    {
      write-host "Import-Csv : Could not find the site details csv file at provided path " $filepath

       Write "Import-Csv : Could not find the site details csv file at provided path ::$( $filepath)" >>$logPath
       Write "Error:: "$_.Exception.Message >>$logPath

      return
    }
    
	 #iterating through each site url
     ForEach($row in $createList)
     { 
	 #getting each site url
      	 $SiteUrl = $row.SiteUrl
            if($SiteUrl.length -gt 0)
            {
			   #Getting holder name of the site if exists
                $currFolderPath=$row.HolderName

				# check if holder name is not exist then it will set the destination path to root folder 
				#other wise it will create a folder with name of the holder name for the site under the root folder and 
				#set it as destination folder
                if($currFolderPath.Length -gt 0)
                {
                    $FinalDestination=$Defaultdestination+'\'+$currFolderPath  
                    $currFolderPath=''
                }
                else
                {
                    $FinalDestination=$Defaultdestination        
                }        
               try
               {
                 $exportStartTime = Get-Date -Format "hh:mm:ss"
                 Write "------------------------------------------------------------------------" >>$logPath
                 Write  "$($exportStartTime): :  The Export started for the site :  $($SiteUrl) " >>$logPath

                 #Download Site Documents + Versions
                 Download_DocumentLibraries $SiteUrl $FinalDestination
                
                 #Export Site Lists and Document Library Metadata
                 Export_Metadata $SiteUrl $FinalDestination

               }
               catch
               {
                  write-host " Issue with accessing the site : " $SiteUrl
                  Write " Issue with accessing the site : : $($SiteUrl)" >>$logPath
                  Write "Error: "$_.Exception.Message >>$logPath
                  return             
                             
               }
               
                  $exportEndTime = Get-Date -Format "hh:mm:ss"

                  Write  "$($exportEndTime) : :  The Export ended for the site : $($SiteUrl)"  >>$logPath
                   Write "-------------------------------------------------------------------" >>$logPath
            }
            else
            {
                continue 
            }
        }

}

  #Function :Export_Metadata
  #Description : Export_Metadata function collect all required metadata and put them in a csv file and save the file in provided location.
  #sourceweb: document library's parent Sit url, for which the metadata need to be export.
  #metadatadestination: location where the metadata csv file will be stored

function Export_Metadata($sourceweb, $metadatadestination)
{
    try
    {
	    Write-Host "Creating Lists and Metadata"
	    $sourceSPweb = Get-SPWeb -Identity $sourceweb 
    }

    catch 
    {   
        Write-Host "Cannot find a SPWeb object with provided Id or Url : " $sourceweb

        Write "Cannot find an SPWeb object with provided Id or Url : " $sourceweb >>$logPath
        Write "Error: "$_.Exception.Message >>$logPath
        return
    } 

    #Create a folder using the site's name to contains the documents and metadata of a site
	$siteDocFolder = $metadatadestination + "\" +$sourceSPweb.Title +"\Document Libraries"
	$metadataFolder = $metadatadestination+"\"+$sourceSPweb.Title+"\Document Metadatas"
	$createMetaDataFolder = New-Item $metadataFolder -type directory 
	$metadatadestination = $metadataFolder
	#as per our requirement we have restrict the document library to custom document library only
    $mylistsToIgnore = @(101)
    $listArray = @()

	# iterate through all document library
	foreach($list in $sourceSPweb.Lists)
	{
            $ListItemCollection =@()
            $listTemplate = $sourceSPweb.ListTemplates | ? {$_.FeatureId -eq $list.TemplateFeatureId}
           if ($mylistsToIgnore.Contains($listTemplate.Type_Client) -and $list.Hidden -eq $False -and !$list.DefaultViewUrl.Contains('/FormServerTemplates/') -and !$list.DefaultViewUrl.Contains('/SiteCollectionDocuments/') -and !$list.DefaultViewUrl.Contains('/Style Library/') -and !$list.DefaultViewUrl.Contains('/SiteAssets/') )
            {
                $ListItems = $list.Items

                if($ListItems.Count -gt 0)
                { 
                    $Listlocation = $metadatadestination+"\"+$list.Title+".csv"
	                Write-Host "Exporting List MetaData: " $list.Title
		                             
                     #Get All List items where Status is "In Progress"
                     try
                     {
                         $list.Items | foreach {

                         try{                        
                                #Construct Document Folder Path \url where the document already downloaded
                                $FolderPath=$_['FileDirRef'] 

                                 if($FolderPath.length -gt 0 )
                                 {
                                    #$FolderPath=$FolderPath -replace $sourceSPweb.ServerRelativeUrl,''
                                    $FolderPath=$FolderPath.Replace($sourceSPweb.ServerRelativeUrl,'')

                                    $pathArray=$FolderPath.split('/',[System.StringSplitOptions]::RemoveEmptyEntries)
                                    $Folderpath =$siteDocFolder
            
                                    for($i=0; $i -le $pathArray.length; $i++)
                                    {
                                        $Folderpath += '\'+ $pathArray[$i]
                                    }
                                 }
  
                                $DocumentPath= $Folderpath + $_['Name']
                                $DocumentPath=$DocumentPath.Replace('\','/')

                                  if($DocumentPath.length -ge 250)
                                    {
                                      $DocumentPath= $siteDocFolder + $longFolderDocumentPath + $_['Name']
                                       $DocumentPath=$DocumentPath.Replace('\','/')
                                    }
                                # End of Construct Document Folder Path\url 

                            
                                 #Add Site Url, Document path, Mime type, Checksum as a column to the output csv file as document metadata
                                 try
                                 {
                                  $myMimeType = $(Get-MimeType -CheckFile $DocumentPath)
                                  $checkSum = $(Get-Checksum  -CheckFile $DocumentPath)         
                                 } 
                                 catch
                                 {                                 
                                    Write-Host "Document not found at : "$DocumentPath
                                    Write "Document not found at : "$DocumentPath >>$logPath
                                    Write "Error: "$_.Exception.Message >>$logPath
                                    $myMimeType=''
                                    $checkSum=''
                                 }
                                 $ExportItem = New-Object PSObject
                                 $ExportItem | Add-Member -MemberType NoteProperty -name "Site URL" -value $sourceSPweb.Url
                                 $ExportItem | Add-Member -MemberType NoteProperty -name "Site Name" -value $sourceSPweb.Title
                                 $ExportItem | Add-Member -MemberType NoteProperty -name "Document Path" -value $DocumentPath
                                 $ExportItem | Add-Member -MemberType NoteProperty -name "Mime Type" -value $myMimeType
                                 $ExportItem | Add-Member -MemberType NoteProperty -name "Checksum" -value $checkSum
                                 $myMimeType=''
                                 $checkSum=''
                                                                                             
                                     #Get Each field
                                     foreach($Field in $_.Fields)
                                     {
                                      
                                      try{

									   # uncomment the following line if we dont need the metadata of hidden field
                                        #if(!$Field.Hidden)
                                        #{       
                                
                                            $fieldValue =$_[$Field.InternalName]
                                             if($Field.InternalName -eq 'Modified_x0020_By' -or $Field.InternalName -eq 'Created_x0020_By')
                                             {
                                        
                                                if($fieldValue.length -gt 0 )
                                                { 
                                                    $fieldValue=$fieldValue.Replace('\','/')                                                                                         
                                                }

                                                $ExportItem | Add-Member -MemberType NoteProperty -name $Field.Title -value $fieldValue -Force                                    
                                           
                                             }                       

                                            elseif($Field.Title -eq $null -or $Field.InternalName -eq 'LinkFilenameNoMenu' -or $Field.InternalName -eq 'LinkFilename' )
                                            {
                                                if($Field.InternalName -eq 'Author' -or $Field.InternalName -eq 'Editor' -or $Field.InternalName -eq 'Owner')  
                                                {
                                                   $userValue=$fieldValue
                                                   if($userValue.length -gt 0 )
                                                   {
                                                       $userValueArray=$userValue.split('#',[System.StringSplitOptions]::RemoveEmptyEntries)                                           
                                                       if($userValueArray.length -gt 0 )
                                                       {
                                                        $User=$userValueArray[1]
                                                         $ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value $User -Force
                                                       }
                                                   }
                                                   else
                                                   {
                                                     $ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value '' -Force
                                                   }

                                                }
                                                else
                                                {                    
                                                    $ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value $fieldValue -Force
                                                }
                                            } 
                                            else
                                            {
                                                if($Field.InternalName -eq 'Author' -or $Field.InternalName -eq 'Editor' -or $Field.InternalName -eq 'Owner')  
                                                {
                                                   $userValue=$fieldValue
                                                   if($userValue.length -gt 0 )
                                                   {                                  
                                                       $userValueArray=$userValue.split('#',[System.StringSplitOptions]::RemoveEmptyEntries)
                                           
                                                       if($userValueArray.length -gt 0 )
                                                       {
                                                        $User=$userValueArray[1]

                                                         $ExportItem | Add-Member -MemberType NoteProperty -name $Field.Title -value $User -Force
                                                       }
                                                   }
                                                   else
                                                   {
                                                     $ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value '' -Force
                                                   }

                                                }
                                                else
                                                {                    
                                                   $ExportItem | Add-Member -MemberType NoteProperty -name $Field.Title -value $fieldValue -Force
                                                }
                                     
                                            }
                                        #} # hidden fields

                                         }

                                        catch
                                        {
                                         Write-Host "Error to get metadata  value of field:" $Field.Title "of current item"
                                         Write " WARN : Error to get metadata  value of field:" $Field.Title "of current item">>$logPath
                                         Write "Error: "$_.Exception.Message >>$logPath
                                         continue
                                        }
                   
                                    }
                              

                                #Add the object with property to an Array
                                $ListItemCollection += $ExportItem 
                                
                            }

                            catch 
                            {
                                Write-Host "Error to get metadata of document : " $_['FileDirRef']  "of current document Library"
                                Write "WARN : Error  to get metadata  of document:" $_['FileDirRef']  >>$logPath
                                Write "Error: "$_.Exception.Message >>$logPath
                                continue
                            }  
                        }
                    }
                    catch
                    {
                    Write "Error: "$_.Exception.Message >>$logPath
                    Write-Host "Error: "$_.Exception.Message 
                    
                    } 
                     

                #Export the result Array to CSV file
                $ListItemCollection | Export-CSV $Listlocation -NoTypeInformation -Encoding UTF8
                Write-host "Library Metadata Exported to $($Listlocation) for site $($list.Title)"
                Write  "Library Metadata Exported to $($Listlocation) for site $($list.Title)">>$logPath
             }
        }

    }
}

# Function: Download_DocumentLibraries
# Description: Calls DownloadDocuments recursiveley to download all document libraries in a site.
# Variables
# $webUrl: The URL of the site to download all document libraries
function Download_DocumentLibraries($webUrl,$documentDestination)
{
    try
    {
	    Write-Host "Downloading Documents from site :" $webUrl
        Write "Downloading Documents from site :" $webUrl >>$logPath

	    $web = Get-SPWeb -Identity $webUrl #-ErrorAction Stop
    }

    catch 
    {   
         Write-Host "Cannot find an SPWeb object with provided Id or Url : " $web

          Write "Cannot find an SPWeb object with provided Id or Url : " $web >>$logPath
          Write "Error: "$_.Exception.Message >>$logPath
        return
    }


	#Create a folder using the site's name
	$siteFolder = $documentDestination + "\" +$web.Title+"\Document Libraries"
	$createSiteFolder = New-Item $siteFolder -type directory 
	$destination = $siteFolder
    $mylistsToIgnore = @(101)  # This will filter only those document library which templare id is 101.    
    $global:SiteDocumentCount=0   #  it will set the number of documentSet Counter to 0 so that finally it will show the number of documents downloaded
    $global:DownloadedDocCount=0 


	foreach($list in $web.Lists)
	{
      try
      {
        $listTemplate = $web.ListTemplates | ? {$_.FeatureId -eq $list.TemplateFeatureId}
           if($listTemplate.Type_Client -eq 101)
           {
                if ($mylistsToIgnore.Contains($listTemplate.Type_Client) -and $list.Hidden -eq $False -and !$list.DefaultViewUrl.Contains('/FormServerTemplates/') -and !$list.DefaultViewUrl.Contains('/SiteCollectionDocuments/') -and !$list.DefaultViewUrl.Contains('/Style Library/') -and !$list.DefaultViewUrl.Contains('/SiteAssets/'))
		        {
			        Write-Host "Downloading Documenfts from  " $list.Title " Library"
                   # Write-host "The number of documents in "$list.Title " Library is : " $list.Items.Count
                    Write "Downloading Documenfts from $($list.Title)  Library" >> $logPath  
                    $global:SiteDocumentCount= $global:SiteDocumentCount+$list.Items.Count
			        $listUrl = $web.Url +"/"+ $list.RootFolder.Url
			        #Download root files
			        DownloadDocuments $list.RootFolder.Url
			        #Download files in folders
			        foreach ($folder in $list.Folders) 
			        {
    		          DownloadDocuments $folder.Url
			        }

                   # Write-host "The number of documents downloaded from $($list.Title) Library is : $($global:DocumentCount)"                                       
                
		        }
           } #end if
        }

        catch 
        {   
             Write-Host "Issue with  access the library : " $list.DefaultViewUrl

              Write "Issue with  access the library : " $list.DefaultViewUrl >>$logPath
              Write "Error: "$_.Exception.Message >>$logPath
        
        }
	}

      Write "********************************************************************************************" >>$logPath
      Write "The number of documents in  $($web.Title)  Site is :  $($global:SiteDocumentCount)" >>$logPath
      Write "The number of documents downloaded from $($web.Title)  Site is : $($global:DownloadedDocCount)" >>$logPath 
      Write "********************************************************************************************" >>$logPath
}

# Function: DownloadDocuments
# Description: Downloads a document library's files; called GetGileVersions to download versions.
# Variables
# $folderUrl: The Document Library to Download
function DownloadDocuments($folderUrl)
{
    $folder = $web.GetFolder($folderUrl)
    foreach ($file in $folder.Files) 
	{
      try
      {
            #Ensure destination directory
		    $destinationfolder = $destination + "\" + $folder.Url 

            $currnetfilepath= $destinationfolder + "\" + $file.Name

            $shortPath=$destination +$longFolderDocumentPath

            # Try to create a new folder with name "Long path Doc" if current document folder path is greater than 260

             if($currnetfilepath.length -ge 250) 
              {                 
				#Create the folder in root folder instade of following the actual folder structure
                 if (!(Test-Path -path $shortPath))
                  {
                        $dest1 = New-Item $shortPath -type directory 
                  }
              }
              else
              {
                  if (!(Test-Path -path $destinationfolder))
                  {
                        $dest = New-Item $destinationfolder -type directory 
                  }              
              }

              #download file in Long path docs folder if path excced teh limit length
              if($currnetfilepath.length -ge 250)
               {
                  $currnetfilepath=  $shortPath + $file.Name
               }                         

           #Download file  
 
            $binary = $file.OpenBinary()
            $stream = New-Object System.IO.FileStream($currnetfilepath), Create
            $writer = New-Object System.IO.BinaryWriter($stream)
            $writer.write($binary)
            $writer.Close()

             $global:DownloadedDocCount++

            #Download file versions. If you don't need versions, comment the line below.
		    GetVersionFiles $file
        }
        catch 
        {   
             Write-Host "Cannot download the document with the Url : " $file.ServerRelativeUrl "from " $file.Web.ServerRelativeUrl "Site"

             Write "WARN : Cannot download the document with the Url : " $file.ServerRelativeUrl "from " $file.Web.ServerRelativeUrl "Site" >>$logPath
              Write "Error: "$_.Exception.Message >>$logPath
        }		
	}
}

# Function: GetVersionFiles
# Description: Downloads all versions of every file in a document library
# Variables
# $WebURL: The URL of the website that contains the document library
# $DocLibURL: The location of the document Library in the site
# $DownloadLocation: The path to download the files to

function GetVersionFiles($file)
{
	foreach($version in $file.Versions)
	{
        try
        {
		    #Add version label to file in format: [Filename]_v[version#].[extension]
            if($file.Name.length -gt 0)

            {
		        $filesplit = $file.Name.split(".") 
		        $fullname = $filesplit[0] 
		        $fileext = $filesplit[1] 
		        $FullFileName = $fullname+"_v"+$version.VersionLabel+"."+$fileext			

		        #Can't create an SPFile object from historical versions, but CAN download via HTTP
		        #Create the full File URL using the Website URL and version's URL
		        $fileURL = $webUrl+"/"+$version.Url

		        #Full Download path including filename
		        $DownloadPath = $destinationfolder+"\"+$FullFileName

		        #Download the file from the version's URL, download to the $DownloadPath location
		        HTTPDownloadFile "$fileURL" "$DownloadPath"
            }
        }
        catch 
        {   
              Write-Host "Cannot download the document for version  : "  $version.VersionLabel

              Write  "WARN :Cannot download the document of version  : "  $version.VersionLabel "for :" $fileURL >>$logPath
              Write "Error: "$_.Exception.Message >>$logPath

        }
	}
}

# Function: HTTPDownloadFile
# Description: Downloads a file using webclient
# Variablesis
# $ServerFileLocation: Where the source file is located on the web
# $DownloadPath: The destination to download to
function HTTPDownloadFile($ServerFileLocation, $DownloadPath) 
{
    try
    {
	    $webclient = New-Object System.Net.WebClient
	    $webClient.UseDefaultCredentials = $true
	    $webclient.DownloadFile($ServerFileLocation,$DownloadPath)
    }    
    catch
    {
        Write " WARN : Cannot download the document for version  : "  $version.VersionLabel  "for file :" $ServerFileLocation >>$logPath
        Write "Error: "$_.Exception.Message >>$logPath
    }
}

#Getting Mime Type of the document
function Get-MimeType()
 { 
    param([parameter(Mandatory=$true, ValueFromPipeline=$true)][ValidateNotNullorEmpty()][System.IO.FileInfo]$CheckFile) 
    begin { 
        Add-Type -AssemblyName "System.Web"         
        [System.IO.FileInfo]$check_file = $CheckFile 
        [sting]$mime_type = $null 
    } 
    process { 
        if ($check_file.Exists) {  
            $mime_type = [System.Web.MimeMapping]::GetMimeMapping($check_file.FullName)  
        } 
        else { 
            $mime_type = "false" 
        } 
    } 
    end { return $mime_type } 
}

#Getting Checksum of the document
function Get-Checksum() 
{ 
    param([parameter(Mandatory=$true, ValueFromPipeline=$true)][ValidateNotNullorEmpty()][System.IO.FileInfo]$CheckFile) 
    begin { 
        Add-Type -AssemblyName "System.Web"         
        [System.IO.FileInfo]$check_file = $CheckFile 
        [string]$checksum = $null 
    } 
    process { 
        if ($check_file.Exists) {  
            $checksum = Get-FileHash -LiteralPath ($check_file.FullName) -Algorithm SHA256
        } 
        else { 
            $checksum = "false" 
        } 
    } 
    end { return $checksum.Split(";")[1].Split("=")[1] } 
}

Thanks for Reading 🙂

Keep reading, share your thoughts, experiences. Feel free to contact us to discuss more. If you have any suggestion / feedback / doubt, you are most welcome.

Stay tuned on Knowledge-Junction, will come up with more such articles.

You may also like...

Leave a ReplyCancel reply

Trending Articles

Recent Posts

Archives

SharePoint – Export SharePoint Documents And Metadata Using Power Shell Script

Share this:

Like this:

Related Articles

You may also like...

SUCCESS – Behaviour/Responsibility

Implement Carousel in SPFx webpart using react-slick

New feature – Microsoft 365 – SharePoint online administration – Design Ideas are now available

Leave a ReplyCancel reply

Trending Articles

Recent Posts

Archives

Discover more from Microsoft 365