SharePoint – Export SharePoint Documents And Metadata Using Power Shell Script
Recently one of our client wanted to move out of SharePoint and the customer have a Sharepoint farm with 800 web sites. We have asked to extract all content from a SharePoint site with their metadata for archival. They need all documents to be download to a folder structure as it is in the folder hierarchy in the site.They would also like to have a CSV file for each document library, which will contain all metadata of each document including some other important data of the document .e.g. Site Url, Document Path after downloaded, Mime Type and Checksum of each document. Customer will use this information to migrate the contents to destination which is implemented using different technology. So we have used PowerShell to address the customer’s requirement.
After googling some time I have come across a nice blog written by Nik Craik .We have used the script from this blog as our base script, then modified and added new features as per our requirement. Along with the required features, we have included logging feature. The log file provides details like, number of documents in a site and number of document exported from that site. It also log the details of the error if there is any during the export.
In our case we have used a csv file, which will provide all required inputs to the script. The file contains list of site urls, need to be export and a holder name for each site so that a folder will be create as per holder name and the document will be download to their respected holder name folder.
In this script we have, restrict the script so that it will only download documents from the document library which template id is 101 and avoid all other document library. You can modify the script as per your requirement.
I put detailed comments in the script, to make it self explanatory.
<# .SYNOPSIS Export the metadata of all document to a CSV file. We can provide multiple site url so that at a time we can export documents from multiple sites. .PARAMETER InputFilepath The File path is a path where the csv file exist. this file contains all the site url and holder name. This file will provide the inputs to the script e.g "C:\M\Script\Site_Details.csv" .PARAMETER Destination The location path,where the script will Download files and its metadata . e.g "C:\M\Export" #> param ( [parameter(Mandatory=$true)][string]$InputFilepath, [parameter(Mandatory=$true)][string]$Destination ) #Check and add PowerShell snap in if(-not(Get-PSSnapin | Where { $_.Name -eq "Microsoft.SharePoint.PowerShell"})) { Add-PSSnapin Microsoft.SharePoint.PowerShell -erroraction SilentlyContinue; } #The File path is a path where the log file will be create. #e.g "C:\M\Export\logs.txt" $logPath = '' #Create a new folder under the site destination folder for document where the actual folder structure is very deep. # So that some time,the long url issue can be avoied. $longFolderDocumentPath="\Long Path Docs\" $FinalDestination='' # variable to hold total number of document in the provided site [int]$SiteDocumentCount= 0 [int]$DownloadedDocCount=0 #Export site by taking both input ExportSite $InputFilepath $Destination
# The initial method to start exporting both site document and its metadata #function ExportSite($site,$destination) #$filepath : the path of csv file #$Defaultdestination : location on file system , to save the exported files function ExportSite($filepath,$Defaultdestination) { $logPath =$Defaultdestination+'\logs.txt' try { #In this csv we have used ';' as delimeter because , few site url contains ',' so it breaks the coad $createList = Import-Csv -path $filepath -Delimiter ';' } catch { write-host "Import-Csv : Could not find the site details csv file at provided path " $filepath Write "Import-Csv : Could not find the site details csv file at provided path ::$( $filepath)" >>$logPath Write "Error:: "$_.Exception.Message >>$logPath return } #iterating through each site url ForEach($row in $createList) { #getting each site url $SiteUrl = $row.SiteUrl if($SiteUrl.length -gt 0) { #Getting holder name of the site if exists $currFolderPath=$row.HolderName # check if holder name is not exist then it will set the destination path to root folder #other wise it will create a folder with name of the holder name for the site under the root folder and #set it as destination folder if($currFolderPath.Length -gt 0) { $FinalDestination=$Defaultdestination+'\'+$currFolderPath $currFolderPath='' } else { $FinalDestination=$Defaultdestination } try { $exportStartTime = Get-Date -Format "hh:mm:ss" Write "------------------------------------------------------------------------" >>$logPath Write "$($exportStartTime): : The Export started for the site : $($SiteUrl) " >>$logPath #Download Site Documents + Versions Download_DocumentLibraries $SiteUrl $FinalDestination #Export Site Lists and Document Library Metadata Export_Metadata $SiteUrl $FinalDestination } catch { write-host " Issue with accessing the site : " $SiteUrl Write " Issue with accessing the site : : $($SiteUrl)" >>$logPath Write "Error: "$_.Exception.Message >>$logPath return } $exportEndTime = Get-Date -Format "hh:mm:ss" Write "$($exportEndTime) : : The Export ended for the site : $($SiteUrl)" >>$logPath Write "-------------------------------------------------------------------" >>$logPath } else { continue } } }
#Function :Export_Metadata #Description : Export_Metadata function collect all required metadata and put them in a csv file and save the file in provided location. #sourceweb: document library's parent Sit url, for which the metadata need to be export. #metadatadestination: location where the metadata csv file will be stored function Export_Metadata($sourceweb, $metadatadestination) { try { Write-Host "Creating Lists and Metadata" $sourceSPweb = Get-SPWeb -Identity $sourceweb } catch { Write-Host "Cannot find a SPWeb object with provided Id or Url : " $sourceweb Write "Cannot find an SPWeb object with provided Id or Url : " $sourceweb >>$logPath Write "Error: "$_.Exception.Message >>$logPath return } #Create a folder using the site's name to contains the documents and metadata of a site $siteDocFolder = $metadatadestination + "\" +$sourceSPweb.Title +"\Document Libraries" $metadataFolder = $metadatadestination+"\"+$sourceSPweb.Title+"\Document Metadatas" $createMetaDataFolder = New-Item $metadataFolder -type directory $metadatadestination = $metadataFolder #as per our requirement we have restrict the document library to custom document library only $mylistsToIgnore = @(101) $listArray = @() # iterate through all document library foreach($list in $sourceSPweb.Lists) { $ListItemCollection =@() $listTemplate = $sourceSPweb.ListTemplates | ? {$_.FeatureId -eq $list.TemplateFeatureId} if ($mylistsToIgnore.Contains($listTemplate.Type_Client) -and $list.Hidden -eq $False -and !$list.DefaultViewUrl.Contains('/FormServerTemplates/') -and !$list.DefaultViewUrl.Contains('/SiteCollectionDocuments/') -and !$list.DefaultViewUrl.Contains('/Style Library/') -and !$list.DefaultViewUrl.Contains('/SiteAssets/') ) { $ListItems = $list.Items if($ListItems.Count -gt 0) { $Listlocation = $metadatadestination+"\"+$list.Title+".csv" Write-Host "Exporting List MetaData: " $list.Title #Get All List items where Status is "In Progress" try { $list.Items | foreach { try{ #Construct Document Folder Path \url where the document already downloaded $FolderPath=$_['FileDirRef'] if($FolderPath.length -gt 0 ) { #$FolderPath=$FolderPath -replace $sourceSPweb.ServerRelativeUrl,'' $FolderPath=$FolderPath.Replace($sourceSPweb.ServerRelativeUrl,'') $pathArray=$FolderPath.split('/',[System.StringSplitOptions]::RemoveEmptyEntries) $Folderpath =$siteDocFolder for($i=0; $i -le $pathArray.length; $i++) { $Folderpath += '\'+ $pathArray[$i] } } $DocumentPath= $Folderpath + $_['Name'] $DocumentPath=$DocumentPath.Replace('\','/') if($DocumentPath.length -ge 250) { $DocumentPath= $siteDocFolder + $longFolderDocumentPath + $_['Name'] $DocumentPath=$DocumentPath.Replace('\','/') } # End of Construct Document Folder Path\url #Add Site Url, Document path, Mime type, Checksum as a column to the output csv file as document metadata try { $myMimeType = $(Get-MimeType -CheckFile $DocumentPath) $checkSum = $(Get-Checksum -CheckFile $DocumentPath) } catch { Write-Host "Document not found at : "$DocumentPath Write "Document not found at : "$DocumentPath >>$logPath Write "Error: "$_.Exception.Message >>$logPath $myMimeType='' $checkSum='' } $ExportItem = New-Object PSObject $ExportItem | Add-Member -MemberType NoteProperty -name "Site URL" -value $sourceSPweb.Url $ExportItem | Add-Member -MemberType NoteProperty -name "Site Name" -value $sourceSPweb.Title $ExportItem | Add-Member -MemberType NoteProperty -name "Document Path" -value $DocumentPath $ExportItem | Add-Member -MemberType NoteProperty -name "Mime Type" -value $myMimeType $ExportItem | Add-Member -MemberType NoteProperty -name "Checksum" -value $checkSum $myMimeType='' $checkSum='' #Get Each field foreach($Field in $_.Fields) { try{ # uncomment the following line if we dont need the metadata of hidden field #if(!$Field.Hidden) #{ $fieldValue =$_[$Field.InternalName] if($Field.InternalName -eq 'Modified_x0020_By' -or $Field.InternalName -eq 'Created_x0020_By') { if($fieldValue.length -gt 0 ) { $fieldValue=$fieldValue.Replace('\','/') } $ExportItem | Add-Member -MemberType NoteProperty -name $Field.Title -value $fieldValue -Force } elseif($Field.Title -eq $null -or $Field.InternalName -eq 'LinkFilenameNoMenu' -or $Field.InternalName -eq 'LinkFilename' ) { if($Field.InternalName -eq 'Author' -or $Field.InternalName -eq 'Editor' -or $Field.InternalName -eq 'Owner') { $userValue=$fieldValue if($userValue.length -gt 0 ) { $userValueArray=$userValue.split('#',[System.StringSplitOptions]::RemoveEmptyEntries) if($userValueArray.length -gt 0 ) { $User=$userValueArray[1] $ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value $User -Force } } else { $ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value '' -Force } } else { $ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value $fieldValue -Force } } else { if($Field.InternalName -eq 'Author' -or $Field.InternalName -eq 'Editor' -or $Field.InternalName -eq 'Owner') { $userValue=$fieldValue if($userValue.length -gt 0 ) { $userValueArray=$userValue.split('#',[System.StringSplitOptions]::RemoveEmptyEntries) if($userValueArray.length -gt 0 ) { $User=$userValueArray[1] $ExportItem | Add-Member -MemberType NoteProperty -name $Field.Title -value $User -Force } } else { $ExportItem | Add-Member -MemberType NoteProperty -name $Field.InternalName -value '' -Force } } else { $ExportItem | Add-Member -MemberType NoteProperty -name $Field.Title -value $fieldValue -Force } } #} # hidden fields } catch { Write-Host "Error to get metadata value of field:" $Field.Title "of current item" Write " WARN : Error to get metadata value of field:" $Field.Title "of current item">>$logPath Write "Error: "$_.Exception.Message >>$logPath continue } } #Add the object with property to an Array $ListItemCollection += $ExportItem } catch { Write-Host "Error to get metadata of document : " $_['FileDirRef'] "of current document Library" Write "WARN : Error to get metadata of document:" $_['FileDirRef'] >>$logPath Write "Error: "$_.Exception.Message >>$logPath continue } } } catch { Write "Error: "$_.Exception.Message >>$logPath Write-Host "Error: "$_.Exception.Message } #Export the result Array to CSV file $ListItemCollection | Export-CSV $Listlocation -NoTypeInformation -Encoding UTF8 Write-host "Library Metadata Exported to $($Listlocation) for site $($list.Title)" Write "Library Metadata Exported to $($Listlocation) for site $($list.Title)">>$logPath } } } }
# Function: Download_DocumentLibraries # Description: Calls DownloadDocuments recursiveley to download all document libraries in a site. # Variables # $webUrl: The URL of the site to download all document libraries function Download_DocumentLibraries($webUrl,$documentDestination) { try { Write-Host "Downloading Documents from site :" $webUrl Write "Downloading Documents from site :" $webUrl >>$logPath $web = Get-SPWeb -Identity $webUrl #-ErrorAction Stop } catch { Write-Host "Cannot find an SPWeb object with provided Id or Url : " $web Write "Cannot find an SPWeb object with provided Id or Url : " $web >>$logPath Write "Error: "$_.Exception.Message >>$logPath return } #Create a folder using the site's name $siteFolder = $documentDestination + "\" +$web.Title+"\Document Libraries" $createSiteFolder = New-Item $siteFolder -type directory $destination = $siteFolder $mylistsToIgnore = @(101) # This will filter only those document library which templare id is 101. $global:SiteDocumentCount=0 # it will set the number of documentSet Counter to 0 so that finally it will show the number of documents downloaded $global:DownloadedDocCount=0 foreach($list in $web.Lists) { try { $listTemplate = $web.ListTemplates | ? {$_.FeatureId -eq $list.TemplateFeatureId} if($listTemplate.Type_Client -eq 101) { if ($mylistsToIgnore.Contains($listTemplate.Type_Client) -and $list.Hidden -eq $False -and !$list.DefaultViewUrl.Contains('/FormServerTemplates/') -and !$list.DefaultViewUrl.Contains('/SiteCollectionDocuments/') -and !$list.DefaultViewUrl.Contains('/Style Library/') -and !$list.DefaultViewUrl.Contains('/SiteAssets/')) { Write-Host "Downloading Documenfts from " $list.Title " Library" # Write-host "The number of documents in "$list.Title " Library is : " $list.Items.Count Write "Downloading Documenfts from $($list.Title) Library" >> $logPath $global:SiteDocumentCount= $global:SiteDocumentCount+$list.Items.Count $listUrl = $web.Url +"/"+ $list.RootFolder.Url #Download root files DownloadDocuments $list.RootFolder.Url #Download files in folders foreach ($folder in $list.Folders) { DownloadDocuments $folder.Url } # Write-host "The number of documents downloaded from $($list.Title) Library is : $($global:DocumentCount)" } } #end if } catch { Write-Host "Issue with access the library : " $list.DefaultViewUrl Write "Issue with access the library : " $list.DefaultViewUrl >>$logPath Write "Error: "$_.Exception.Message >>$logPath } } Write "********************************************************************************************" >>$logPath Write "The number of documents in $($web.Title) Site is : $($global:SiteDocumentCount)" >>$logPath Write "The number of documents downloaded from $($web.Title) Site is : $($global:DownloadedDocCount)" >>$logPath Write "********************************************************************************************" >>$logPath }
# Function: DownloadDocuments # Description: Downloads a document library's files; called GetGileVersions to download versions. # Variables # $folderUrl: The Document Library to Download function DownloadDocuments($folderUrl) { $folder = $web.GetFolder($folderUrl) foreach ($file in $folder.Files) { try { #Ensure destination directory $destinationfolder = $destination + "\" + $folder.Url $currnetfilepath= $destinationfolder + "\" + $file.Name $shortPath=$destination +$longFolderDocumentPath # Try to create a new folder with name "Long path Doc" if current document folder path is greater than 260 if($currnetfilepath.length -ge 250) { #Create the folder in root folder instade of following the actual folder structure if (!(Test-Path -path $shortPath)) { $dest1 = New-Item $shortPath -type directory } } else { if (!(Test-Path -path $destinationfolder)) { $dest = New-Item $destinationfolder -type directory } } #download file in Long path docs folder if path excced teh limit length if($currnetfilepath.length -ge 250) { $currnetfilepath= $shortPath + $file.Name } #Download file $binary = $file.OpenBinary() $stream = New-Object System.IO.FileStream($currnetfilepath), Create $writer = New-Object System.IO.BinaryWriter($stream) $writer.write($binary) $writer.Close() $global:DownloadedDocCount++ #Download file versions. If you don't need versions, comment the line below. GetVersionFiles $file } catch { Write-Host "Cannot download the document with the Url : " $file.ServerRelativeUrl "from " $file.Web.ServerRelativeUrl "Site" Write "WARN : Cannot download the document with the Url : " $file.ServerRelativeUrl "from " $file.Web.ServerRelativeUrl "Site" >>$logPath Write "Error: "$_.Exception.Message >>$logPath } } }
# Function: GetVersionFiles # Description: Downloads all versions of every file in a document library # Variables # $WebURL: The URL of the website that contains the document library # $DocLibURL: The location of the document Library in the site # $DownloadLocation: The path to download the files to function GetVersionFiles($file) { foreach($version in $file.Versions) { try { #Add version label to file in format: [Filename]_v[version#].[extension] if($file.Name.length -gt 0) { $filesplit = $file.Name.split(".") $fullname = $filesplit[0] $fileext = $filesplit[1] $FullFileName = $fullname+"_v"+$version.VersionLabel+"."+$fileext #Can't create an SPFile object from historical versions, but CAN download via HTTP #Create the full File URL using the Website URL and version's URL $fileURL = $webUrl+"/"+$version.Url #Full Download path including filename $DownloadPath = $destinationfolder+"\"+$FullFileName #Download the file from the version's URL, download to the $DownloadPath location HTTPDownloadFile "$fileURL" "$DownloadPath" } } catch { Write-Host "Cannot download the document for version : " $version.VersionLabel Write "WARN :Cannot download the document of version : " $version.VersionLabel "for :" $fileURL >>$logPath Write "Error: "$_.Exception.Message >>$logPath } } }
# Function: HTTPDownloadFile # Description: Downloads a file using webclient # Variablesis # $ServerFileLocation: Where the source file is located on the web # $DownloadPath: The destination to download to function HTTPDownloadFile($ServerFileLocation, $DownloadPath) { try { $webclient = New-Object System.Net.WebClient $webClient.UseDefaultCredentials = $true $webclient.DownloadFile($ServerFileLocation,$DownloadPath) } catch { Write " WARN : Cannot download the document for version : " $version.VersionLabel "for file :" $ServerFileLocation >>$logPath Write "Error: "$_.Exception.Message >>$logPath } }
#Getting Mime Type of the document function Get-MimeType() { param([parameter(Mandatory=$true, ValueFromPipeline=$true)][ValidateNotNullorEmpty()][System.IO.FileInfo]$CheckFile) begin { Add-Type -AssemblyName "System.Web" [System.IO.FileInfo]$check_file = $CheckFile [sting]$mime_type = $null } process { if ($check_file.Exists) { $mime_type = [System.Web.MimeMapping]::GetMimeMapping($check_file.FullName) } else { $mime_type = "false" } } end { return $mime_type } } #Getting Checksum of the document function Get-Checksum() { param([parameter(Mandatory=$true, ValueFromPipeline=$true)][ValidateNotNullorEmpty()][System.IO.FileInfo]$CheckFile) begin { Add-Type -AssemblyName "System.Web" [System.IO.FileInfo]$check_file = $CheckFile [string]$checksum = $null } process { if ($check_file.Exists) { $checksum = Get-FileHash -LiteralPath ($check_file.FullName) -Algorithm SHA256 } else { $checksum = "false" } } end { return $checksum.Split(";")[1].Split("=")[1] } }
Thanks for Reading 🙂
Keep reading, share your thoughts, experiences. Feel free to contact us to discuss more. If you have any suggestion / feedback / doubt, you are most welcome.
Stay tuned on Knowledge-Junction, will come up with more such articles.
You must log in to post a comment.