/* Objective: Since Windows 10 File Explorer search seems messed-up on my laptop (and computers of at least some others reporting online) since late 2019, try to make something for finding files/folders on my laptop while waiting for a fix! Following Apache POI library JARs added to the (NetBeans) project: poi-4.1.1 poi-ooxml-4.1.1 poi-ooxml-schemas-4.1.1 poi-scratchpad-4.1.1 xmlbeans-3.1.0 commons-compress-1.19 commons-math3-3.6.1 commons-collections4-4.4 v10_a 23Feb2020 Not a working solution, just archiving the code as a partial record of tweaks tried. --Just trialling replacement of [new Thread + interrupt call] with E[xecutorService() + shutdownNow() call] to see if that would eliminate the known issue of having a single result from a still-running previous search printed when a new search is initiated ...no, does not help by itself (not necessarily surprising as I think shutdownNow() calls interrupt() internally), though delaying progression after the shutdownNow() call with awaitTermination(...) with sufficient timeout may do so. See below for further notes, and trial alternative delay polling value of isTerminated(), and the complcation if search has not (yet) sent any results to terminal method in stream Also realised that the synchronization seems to make no difference now (and subsequently found not needed with the current version of the 'new Thread' code either ...maybe it was the introduction of the interrupt-old-thread approach in a previous version). Will not integrate these changes now - just archive this version for future reference in case want to reconsider use of the ExecutorService for other reasons. Items that might be added/addressed later: see comments at bottom file */ import java.awt.BorderLayout; import java.awt.Color; import java.awt.Font; import java.awt.GridLayout; import java.awt.Insets; import java.io.FileInputStream; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; import java.util.logging.Level; import java.util.logging.Logger; import java.util.stream.Stream; import javax.swing.BorderFactory; import javax.swing.JButton; import javax.swing.JFrame; import javax.swing.JLabel; import javax.swing.JPanel; import javax.swing.JScrollPane; import javax.swing.JTextArea; import javax.swing.border.EmptyBorder; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.DataFormatter; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.apache.poi.xwpf.usermodel.XWPFParagraph; class FindFileOrFolder_v10_a { String missingStartMessage = ""; String missingTargetMessage = ""; int walkDepth = Integer.MAX_VALUE; // specifies how many subdirectory levels to go down boolean caseSensitive; // whether search term is treated as case-sensitive (default is no) boolean lookInside; // whether to search for target inside txt and cvs files also // Thread searchThread = new Thread(); // --ref to allow arrangement for new instance of thread // // that runs search code to interrupt any previous still-running instance ExecutorService searchService = Executors.newSingleThreadExecutor(); // --trialling // Declaring (and initialising) fields for GUI components... JLabel startFolderLabel = new JLabel("Enter the path of the folder from within which you want to (start your) search..."); JTextArea startFolderTextArea = new JTextArea(2, 60); // input to specify foolder within which to (start) searching JButton depthButton = new JButton("Search only in starting folder (do not include subfolders)"); JButton withinFileButton = new JButton("Search also text within following file types: txt, doc, docx, cvs, xls, xlsx"); JPanel howDeepPanel = new JPanel(new GridLayout()); // to hold the depthButton and withinFileButton JPanel wherePanel = new JPanel(new BorderLayout()); // to hold startFolderLabel, startFolderTextArea & howDeepPanel JLabel targetNameLabel = new JLabel("Enter the name or partial name of a file or folder you want to find..."); JTextArea targetNameTextArea = new JTextArea(2, 60); // input to specify file/folder names for which to search JButton caseButton = new JButton("Make search case-sensitive"); JPanel whatPanel = new JPanel(new BorderLayout()); // targetNameLabel, targetNameTextArea & caseButton JPanel inputsPanel = new JPanel(new BorderLayout()); // to hold wherePanel & whatPanel JButton findButton = new JButton("Find files/folders"); JTextArea statusDisplayTextArea = new JTextArea(1, 50); // displays "in progress" vs "finished" JTextArea resultsDisplayTextArea = new JTextArea(40, 100); // displays output, i.e. paths for files/folders found JPanel findAndShowPanel = new JPanel(new BorderLayout()); // to hold findButton & resultsDisplayTextArea JFrame frame = new JFrame("FindFileOrFolder"); // to hold all above (sub)panels FindFileOrFolder_v10_a() // constructor, called when main method runs { startFolderLabel.setBorder(new EmptyBorder(10, 10, 0, 0)); startFolderTextArea.setLineWrap(true); startFolderTextArea.setMargin(new Insets(5, 10, 5, 10)); JScrollPane startSP = new JScrollPane(startFolderTextArea); startSP.setBorder(BorderFactory.createMatteBorder(2, 0, 0, 0, new Color(245, 245, 245))); depthButton.addActionListener(actionEvent -> { if (walkDepth == Integer.MAX_VALUE) { walkDepth = 1; depthButton.setText("Revert to searching in subfolders also"); } else // (walkDepth is 1) { walkDepth = Integer.MAX_VALUE; depthButton.setText("Revert to searching in starting folder only"); } } ); // toggles walk dept between no-subfolders (starting state) and all-subfolders howDeepPanel.add(depthButton, BorderLayout.WEST); withinFileButton.addActionListener(actionEvent -> { if (lookInside == false) // (could rewrite as !lookInside) { lookInside = true; withinFileButton.setText("Revert to not searching also text within files (docx, doc, txt, cvs)"); } else // (lookInside is true) { lookInside = false; withinFileButton.setText("Revert to searching also text within following file types: docx, doc, txt, cvs"); } } ); // toggles between searching just file names and text within any txt/cvs files also howDeepPanel.add(withinFileButton, BorderLayout.EAST); wherePanel.setBackground(new Color(245, 245, 245)); wherePanel.add(startFolderLabel, BorderLayout.NORTH); wherePanel.add(startSP, BorderLayout.CENTER); wherePanel.add(howDeepPanel, BorderLayout.SOUTH); targetNameLabel.setBorder(new EmptyBorder(10, 10, 0, 0)); targetNameTextArea.setLineWrap(true); targetNameTextArea.setMargin(new Insets(5, 10, 5, 10)); JScrollPane targetNameSP = new JScrollPane(targetNameTextArea); targetNameSP.setBorder(BorderFactory.createMatteBorder(2, 0, 0, 0, new Color(245, 245, 245))); caseButton.addActionListener(actionEvent -> { if (caseSensitive == false) { caseSensitive = true; caseButton.setText("Make search case-insentitive again"); } else // (caseSensitive is true) { caseSensitive = false; caseButton.setText("Make search case-sentitive again"); } } ); // toggles search between case-insensitive (starting state) and case-sensitive whatPanel.setBackground(new Color(245, 245, 245)); whatPanel.add(targetNameLabel, BorderLayout.NORTH); whatPanel.add(targetNameSP, BorderLayout.CENTER); whatPanel.add(caseButton, BorderLayout.SOUTH); inputsPanel.add(wherePanel, BorderLayout.NORTH); inputsPanel.add(whatPanel, BorderLayout.SOUTH); findButton.addActionListener(actionEvent -> { // (the 'actionPerformed' method body of the implicit ActionListner...) // searchThread.interrupt(); // --stop thread from any incomplete previous search [find better way?] searchService.shutdownNow(); // --trialling // --trialling also (one or the other)...I think these can prevent the 'wayward prev search result', but... // try {searchService.awaitTermination(1000, TimeUnit.MILLISECONDS);} catch (InterruptedException ex){System.out.println(ex);} // while (!searchService.isTerminated()) // { // try {Thread.sleep(10);} catch (InterruptedException ex) {System.out.println(ex);} // } // ...while these work well when the interrupted search is already printing results // (as is the case if a very common search term is used), searches that have not yet // printed results (where the term is rare or absent in the examined files/folders) // can take a long time to finish up. // I speculate that this is because the stream's terminal method below, .allMatch(p -> !Thread.interrupted()), // does not receive a p (path element) // IN FACT, I realise that non-printing searches also keep processing after the stutdownNow call anyway // (or in the 'new Thread' version, after the interrupt() call) // (processing is not visible to user, though, just see it in the form of interruption exceptions // continuing to print in console); will make a note on this at bottom file also, // and there also in v10_b, and, rerospectively, in v9 resultsDisplayTextArea.setText(null); // clear any previous text // --trialling ExecutorService, find that synchronization now seems to make no (positive) difference // synchronized(findButton) // to prevent more than one search running at the same time [find better way?] // { String startText = startFolderTextArea.getText().trim(); Path startPath = null; // path to folder within which to (start) searching boolean startPathValid = true; try { startPath = Paths.get(startText); // ...for some start inputs on this call } catch (Exception e) // to handle possible InvalidPathException { startPathValid = false; } Path validatedStartPath = startPath; // (need an effectively final variable for use later in a lambda) if (startPathValid) // even if real path input, want to check that it a folder (cf a file), so reassign to... { startPathValid = Files.isDirectory(startPath); // (as empty string arg seems to generate Path regarded // as valid (root/current folder?), so for now including check re startText.isEmpty() below) } String targetText = targetNameTextArea.getText(); // (partial) names of files/folders for which to search final String target = caseSensitive ? targetText : targetText.toLowerCase(); if (startText.isEmpty() || !startPathValid) { missingStartMessage = "No valid start path supplied" + "\n"; } if (target.isEmpty()) { missingTargetMessage = "No search term supplied" + "\n"; } if (!startText.isEmpty() && startPathValid && !target.isEmpty()) { // Only run process below if target text and valid start path have been supplied (avoid wasteful processing) statusDisplayTextArea.setText("Search in progress..."); resultsDisplayTextArea.setForeground(Color.BLACK); resultsDisplayTextArea.setText(null); // clear any previous text before displaying the results // searchThread = new Thread(() -> // --trialling searchService = Executors.newSingleThreadExecutor(); searchService.execute(()-> { try { Stream targetStream = Files.find(validatedStartPath, walkDepth, (p,a) -> !isHiddenHandler(p) // to exclude hidden (temporary etc) files && ( toStringHandled(p.getFileName()).contains(target) // target in file/folder name... || (lookInside ? fileContainsTarget(p, target) : false) // or target within file text ) ); targetStream .peek(p -> resultsDisplayTextArea.append(p + "\n")) // print paths found in output/display area .allMatch(p -> !Thread.interrupted()); // Printing in forEach to peek followed by the allMatch, rather than forEach, // reduces residual printing of results from interrupted search results among new search results /message to one // (Though if I in addition make the print in the peek conditional on !Thread.interrupted() // get ther reverse - EVERYTHING in first search gets printed anyway!!!???) // (Note also: putting the targetStream declaration-instantiation above into try-with-resources // did not seem to make any difference, though might be advisable anyway?) if (resultsDisplayTextArea.getText().equals("")) { resultsDisplayTextArea.setForeground(Color.BLUE); resultsDisplayTextArea.setText("No results found"); } // (seems inelegant, but have not thought of a way to do directly from the stream code yet) statusDisplayTextArea.setText("Search completed"); } catch (IOException ex) { if (ex.getClass().getName().equals("java.nio.file.NoSuchFileException")) { missingInputsMessage(); } // (probably not needed now, though, as should not get to try clause without valid start path) } }); // searchThread.start(); // --trialling } else { missingInputsMessage(); } // --trialling // } // end of synchronized block } // (...end of 'actionPerformed' method body of the implicit ActionListner) ); findButton.setMargin(new Insets(10, 10, 10, 10)); findButton.setFont(new Font("SansSerif", Font.BOLD, 20)); findAndShowPanel.add(findButton, BorderLayout.NORTH); statusDisplayTextArea.setEditable(false); statusDisplayTextArea.setLineWrap(true); statusDisplayTextArea.setWrapStyleWord(true); statusDisplayTextArea.setMargin(new Insets(5, 5, 5, 5)); statusDisplayTextArea.setFont(new Font("SERIF", Font.BOLD, 20)); statusDisplayTextArea.setForeground(Color.GREEN); findAndShowPanel.add(new JScrollPane(statusDisplayTextArea), BorderLayout.CENTER); resultsDisplayTextArea.setEditable(false); resultsDisplayTextArea.setLineWrap(true); resultsDisplayTextArea.setWrapStyleWord(true); resultsDisplayTextArea.setMargin(new Insets(5, 5, 5, 5)); findAndShowPanel.add(new JScrollPane(resultsDisplayTextArea), BorderLayout.SOUTH); frame.add(inputsPanel, BorderLayout.NORTH); frame.add(findAndShowPanel, BorderLayout.SOUTH); frame.setResizable(false); // (buttons disappear if user drags frame bottom up, // while if it's dragged down, the extra space just appears as a gap between the panels, // so just hard-coding big resultsDisplayTextArea for the moment; // looking briefly online, see descriptions/code for how to make rezizable by dragging, but not trivial frame.setVisible(true); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); frame.pack(); // (may need to keep this positioned last) } void missingInputsMessage() // puts message in display area if one or both user inputs missing/incorrect { statusDisplayTextArea.setText(null); // --v8 added (realised should have been present in previous version(s)) resultsDisplayTextArea.setForeground(Color.red); resultsDisplayTextArea.setText(missingStartMessage + missingTargetMessage); missingStartMessage = ""; // reset for future clicks missingTargetMessage = ""; // (ditto) } boolean isHiddenHandler (Path p) // as Files.isHidden(...) throws checked exception... { // ...easier to handle it externally than in stream coded in findButton's addActionListener above boolean result = false; try { result = Files.isHidden(p); } catch (IOException ex) { System.out.println("From isHiddenHandler: " + ex); } return result; } String toStringHandled (Path p) // as Path's toString() can throw NullPointerException... { // ...easier to handle it externally than in stream coded in findButton's addActionListener above String result = ""; try { result = p.toString(); // p is filename for start path, and is null if that is root, e.g. C:\ } catch (NullPointerException npEx) // ...in which case NullPointerException is thrown { resultsDisplayTextArea.setText("Note: As you are starting from the root, " + "the search may terminate early if subfolders without access permission are encountered. " + "(This is a known issue, and may also affect searches starting further down the hierarchy, " + "in which case you will not see feedback, unfortunately)." + " May be addressed in a subsequent version of the program." + "\n\n"); } if (!caseSensitive) // if user has chosen case-sensitive option, this does not happen... { result = result.toLowerCase(); // ...search term made all-lowercase } return result; } boolean fileContainsTarget (Path p, String target) { String fileName = (p.getFileName().toString().toLowerCase()); if (Files.isReadable(p)) // (have not found I actually need this isReadable check, but testing has been limited) { if ( ( fileName.endsWith(".txt") || // defining file types in which to search... fileName.endsWith(".csv") // ...and could add any other 'UTF text' types if there are any ) ) { return txtORcsvHasTarget(p, target); } else if (fileName.endsWith(".docx")) { return docxHasTarget(p, target); } else if (fileName.endsWith(".doc")) { return docHasTarget(p, target); } // v9 added... else if (fileName.endsWith(".xlsx") || fileName.endsWith(".xls")) { return xlHasTarget(p, target); } } // (or could parse the file extension as a string and use a switch statement instead) return false; } boolean txtORcsvHasTarget (Path p, String target) { Stream linesFromFile = Stream.empty(); try { linesFromFile = Files.lines(p, StandardCharsets.ISO_8859_1); // Note for future reference: adding second arg StandardCharsets.ISO_8859_1 may avoid // MalformedInputException being thrown for non-UTF-encoded files, // e.g. docx, xlsx, pdf, which I am not including here as only 'gibberish' symbols are displayed of course if (!caseSensitive) { linesFromFile = linesFromFile.map(s -> s.toLowerCase()); // or could use String::toLowerCase I think } return linesFromFile.anyMatch(s -> s.contains(target)); // General note(s): As anyMatch(...) will return as soon as a match is found (if any) // it will not waste resources/time processing subsequent file lines. // Order in which lines is processed not important, // so might investigate later if making stream parallel improves speed } catch (IOException ex) { System.out.println("From txtORcsvHasTrget, IOException: " + ex); } catch (Exception ex) // --v9 added { System.out.println("From txtORcsvHasTrget: " + ex); } return false; } boolean docxHasTarget(Path p, String target) { try { FileInputStream fIS = new FileInputStream(p.toFile()); // (Is there a more modern API I could use here with Apache POI?) XWPFDocument docx = new XWPFDocument(fIS); List paragraphList = docx.getParagraphs(); // (Would be nice to generate a stream rather than // read everyting into memory before processing, but there does not seem to be a methof for that in current Apace POI?) // May be a way to do for Excel files when I address them later? http://poi.apache.org/components/spreadsheet/limitations.html Stream paragraphStringStream = paragraphList.stream().map(para -> para.getText()); if (!caseSensitive) { paragraphStringStream = paragraphStringStream.map(s -> s.toLowerCase()); // or could use String::toLowerCase I think } return paragraphStringStream.anyMatch(s -> s.contains(target)); } catch (IOException ex) { System.out.println("From docxHasTarget, IOException: " + ex); } catch (org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException ex) // See Note2 at bottom file { // System.out.println("org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException from docxHasTarget(...) method, " // + "processing file " + p + " ...ignore and continue."); // Keeping disabled unless needed for troubleshooting as these prints are visible to end-user in GUI // and can slow the search if many 'problem' .docx files are encountered } catch (Exception ex) // --v9 added { System.out.println("From docxHasTarget: " + ex); } return false; } boolean docHasTarget(Path p, String target) { try { FileInputStream fIS = new FileInputStream(p.toFile()); HWPFDocument doc = new HWPFDocument(fIS); WordExtractor extractor = new WordExtractor(doc); String[] paragraphStrings = extractor.getParagraphText(); // (getText() would extract all text as single String) Stream paragraphStringStream = Arrays.stream(paragraphStrings); if (!caseSensitive) { paragraphStringStream = paragraphStringStream.map(s -> s.toLowerCase()); // or could use String::toLowerCase I think } return paragraphStringStream.anyMatch(s -> s.contains(target)); } catch (IOException ex) { System.out.println("From docHasTarget, IOException: " + ex); } catch (IllegalArgumentException ex) // See Note1 at bottom file { // System.out.println("IllegalArgumentException from docHasTarget(...) method ," // + "processing file " + p + ", probably because" // + " a non-doc file with a mis-applied doc extension was encountered...ignore and continue."); // Keeping disabled unless needed for troubleshooting as these prints are visible to end-user in GUI // and can slow the search if many 'problem' .doc files are encountered } catch (Exception ex) // --v9 added { System.out.println("From docHasTarget: " + ex); } return false; } boolean xlHasTarget(Path p, String target) { try (Workbook workbook = WorkbookFactory.create(p.toFile())) // instantiating the workbook in try-with-resources { // allows it to be auto-closed whether ot not target is found, or in the event of an exception // (otherwise, cannot manually open xls files after a search while this program (GUI) is open) DataFormatter dataFormatter = new DataFormatter(); for(Sheet sheet: workbook) { for (Row row: sheet) { for(Cell cell: row) { String cellValue = dataFormatter.formatCellValue(cell); if (!caseSensitive) { cellValue = cellValue.toLowerCase(); } if (cellValue.contains(target)) { return true; } } } } } catch (IOException ex) { System.out.println("From xlHasTarget, IOException: " + ex); } catch (Exception ex) // --v9 added { System.out.println("From xlHasTarget: " + ex); } return false; } public static void main(String[] args) { new FindFileOrFolder_v10_a(); } } /* Items that might be added/addressed later: --Fields could be encapsulated, but I don’t think it is worth the extra code complexity at the moment. Encapsulation could be considered if the API of subsequent versions of the code were to be used from external programs. --Known issue: If the user clicks the Find button again before a previous search has completed, possibly with a new search term, although the previous search is discarded now (good), a single result sometimes gets through to the new results list. (Could make a list/queue of results before printing it from a loop to see if breaking from that if Thread.interrupted() true is fully effective, but I would rather allow the user to see the results as they are being produced by Files.find(…).) Also, if the interrupted search has have not yet printed results (as is the case when the search term is rare or absent in the examined files/folders), it actually continues processing in the background until it sends a result to the terminal method or finishes looking through all files/folders. I assume this is because the stream's terminal method below, .allMatch(p -> !Thread.interrupted()), does not receive a p (path element). Processing is not visible to user, though, just see it in the form of interruption exceptions continuing to print in console. Another caveat re code from v8-on: though program seems to work well enough (as far as my limited testing has ascertained), the way this is done is probably messy (many exceptions thrownn, and an Error), not sure interrupt() is being used correctly. Further work: perhaps change to FileVisitor with Files.walkFileTree(...) in place of Files.find(...) for better control? --Attempt to allow a search to be truncated without exiting the program or starting a new search, ideally retaining results found to that point in the display. --Maybe try to address known issue that searches with search-in-subfolders enabled from some start paths near the root, e.g. C:\Users, and even C:\Users\[my user name]\Documents on my system, terminate prematurely. (However, though the user does not receive feedback and may not realise that not all files which should be found are, the program does not crash and will respond normally to a 'regular' subsequent search.) Cause = AccessDeniedException thrown due to denial of access to some folders. Might not be able to handle this from Files.find(...), as used currently, or Files.walk(...), so perhaps try to instead use walkFileTree + FileVisitor. --User settings provided by the upper buttons (all but the Find button) could instead be provided by dropdowns, radio buttons or checkboxes (+see * below), to make current settings more immediately obvious. Checkboxes would be especially suitable as it would be best to allow user to choose which combination file types to include. --Maybe try to address known issue that searches with search-within-files enabled may be slowed if 'inauthentic' files having .doc(/.docx) extensions are encountered. See comments in Note1 and Note2 below. Way to determine actual file type rather than just parsing extension before sending to Apache POI code? Perhaps use walkFileTree + FileVisitor instead of Files.find(...) to skip _vti_cnf folders? *Check boxes might be especially useful to allow allow user choose serch-within only some of the supported file types --Known issue: Sometimes a given search takes ~10x or more longer than it does if executed subsequently once or repeatedly (on my system). Also, the GUI may take a long time to open sometimes. Try to find out why and address. --Test whether making any of the Streams parallel (where possible, i.e. without loss of any beneficial output order etc) improves speed (on my system, at least); especially relevant if looking within files. --Maybe address any other notes left in comments re possible reconfigurations. --Known issue with the xlHasText method that (probably) does not affect the user (output to GUI): Often get "Cleaning up unclosed ZipFile for archive: [name of my xls/xlsx file]" written to the console (System.err, as it's in red?) Looked online but still not certain re the cause and if I can do anything Not a JDK Exception/Error per se, as (now-removed) catch clause for Thowable does not capture */ /* Other notes: --Note1: Searching in one of my large folders (note to self - “Archives”) with search-in-subfolders & within-files enabled, got “java.lang.IllegalArgumentException: The document is really a UNKNOWN file” ...coming from HWPFDocument via invoction of my docHasTarget(...) method. Looking online,this may be applicanble: https://stackoverflow.com/questions/4996954/error-in-displaying-a-doc-file-reading-that-from-a-document-on-console-in-java comments “That's a typical message of an IllegalArgumentException from the HWPFDocument constructor. To the point it means that the supplied file is actually a (Wordpad) RTF file whose .rtf extension has incorrectly been renamed to .doc.” On handling the exception, I note that many of my 'problem' files have the .docx extension and open with MS word but are within _vti_cnf folders that I inadvertantly made in my archive folders years ago, and/or were transferred from an Apple Mac years ago, sometimes involving RESOURCE.FRK entity. Are some files wrongly flagged as not .doc? How much are things slowed up? --Note2: I assume the org.apache.poi.openxml4j.exceptions.NotOfficeXmlFileException that I see thrown (albeit much less frequently from the files in the folder mentioned in Note1) from docxHasTarget(...) is somewhat analogous to that of Note1. Also from _vti_cnf folders, as per Note1 above. */